Codes of our ICLR'24 paper. [Paper Link], [Website], [Demo]
Authors: Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang†, Xiangnan He†, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian
* Equal Contribution
† Corresponding
To tackle the two challenges of 3D molecule-text alignment and 3D molecule-centric instruction tuning, we delineate a three-stage training pipeline for 3D-MoLM, including 1) 3D molecule-text representation learning, 2) 3D molecule-text alignment via text generation, and 3) instruction-based fine-tuning.
3D Molecule-Text Alignment maps 3D molecular representations into the input textual space where the LM can understand.
3D Molecule-centric Instruction Tuning fine-tunes the model to follow human instructions on 3D molecule relevant tasks.
Key dependencies include:
torch==2.0.1
transformers==4.35.0
deepspeed==0.12.2
pytorch-lightning==2.0.7
uni-core==0.0.1
See requirements.txt
for more detailed requirements.
./data/
directory.Download following checkpoints from Huggingface-3D-MoLM, and put it under the ./all_checkpoints/
directory.
Run inference.ipynb
to play with 3D-MoLM.
See the log in all_checkpoints/stage1-ft-ckpt/metrics.csv
We provide the outputs of 3D-MoLM on the test set of 3D-MoIT in all_checkpoints/generalist/lightning_logs/version_0/predictions.txt
. Run the following script to read it.
python read_generalist_results.py
--file_path 'all_checkpoints/generalist/lightning_logs/version_0/predictions.txt'
--tokenizer_path 'all_checkpoints/llama-2-7b-hf'
We share the checkpoint for reproducing results.
bash ./scripts/stage3_test.sh
Stage 1: 3D Molecule-Text Representation Learning
Run the following script for stage 1 pretraining:
bash ./scripts/stage1_pretrain.sh
Stage 2: 3D Molecule-Text Alignment via Generative Learning
Run the following script for stage 2 pretraining:
bash ./scripts/stage2_pretrain.sh
Stage 3: Instruction-based Fine-tuning
Run the following script for instruction tuning:
bash ./scripts/stage3_train.sh
If you use our codes or checkpoints, please cite our paper:
@inproceedings{li2024molm,
title={3D-MoLM: Towards 3D Molecule-Text Interpretation in Language Models},
author={Li, Sihang and Liu, Zhiyuan and Luo, Yanchen and Wang, Xiang and He, Xiangnan and Kawaguchi, Kenji and Chua, Tat-Seng and Tian, Qi},
booktitle={ICLR},
year={2024},
url={https://openreview.net/forum?id=xI4yNlkaqh}
}