zjunlp / MolGen

[ICLR 2024] Domain-Agnostic Molecular Generation with Chemical Feedback
https://huggingface.co/spaces/zjunlp/MolGen
MIT License
129 stars 11 forks source link
generation huggingface iclr2024 language-model molecular-generation molecular-optimization molecule molgen multitask pre-trained-language-models pre-trained-model pre-training pytorch selfies targeted-molecular-generation

βš—οΈ MolGen

Domain-Agnostic Molecular Generation with Chemical Feedback

πŸ“ƒ Paper β€’ πŸ€— Model β€’ πŸ”¬ Space

Pytorch license

πŸ”” News

πŸ“• Requirements

To run the codes, You can configure dependencies by restoring our environment:

conda env create -f MolGen/environment.yml -n $Your_env_name$

and then:

conda activate $Your_env_name$

πŸ“š Resource Download

You can download the pre-trained and fine-tuned models via Huggingface: MolGen-large and MolGen-large-opt.

Moreover, the dataset used for downstream tasks can be found here.

The expected structure of files is:

moldata
β”œβ”€β”€ checkpoint 
β”‚Β Β  β”œβ”€β”€ molgen.pkl              # pre-trained model
β”‚   β”œβ”€β”€ syn_qed_model.pkl       # fine-tuned model for QED optimization on synthetic data
β”‚   β”œβ”€β”€ syn_plogp_model.pkl     # fine-tuned model for p-logP optimization on synthetic data
β”‚   β”œβ”€β”€ np_qed_model.pkl        # fine-tuned model for QED optimization on natural product data
β”‚   β”œβ”€β”€ np_plogp_model.pkl      # fine-tuned model for p-logP optimization on natural product data
β”œβ”€β”€ finetune
β”‚Β Β  β”œβ”€β”€ np_test.csv             # nature product test data
β”‚Β Β  β”œβ”€β”€ np_train.csv            # nature product train data
β”‚Β Β  β”œβ”€β”€ plogp_test.csv          # synthetic test data for plogp optimization
β”‚Β Β  β”œβ”€β”€ qed_test.csv            # synthetic test data for plogp optimization
β”‚Β Β  └── zinc250k.csv            # synthetic train data
β”œβ”€β”€ generate                    # generate molecules
β”œβ”€β”€ output                      # molecule candidates
└── vocab_list
    └── zinc.npy                # SELFIES alphabet

πŸš€ How to run

πŸ₯½ Experiments

We conduct experiments on well-known benchmarks to confirm MolGen's optimization capabilities, encompassing penalized logP, QED, and molecular docking properties. For detailed experimental settings and analysis, please refer to our paper.

image image image image

Constrained molecular optimization

image

Citation

If you use or extend our work, please cite the paper as follows:

@inproceedings{fang2023domain,
  author       = {Yin Fang and
                  Ningyu Zhang and
                  Zhuo Chen and
                  Xiaohui Fan and
                  Huajun Chen},
  title        = {Domain-Agnostic Molecular Generation with Chemical feedback},
  booktitle    = {{ICLR}},
  publisher    = {OpenReview.net},
  year         = {2024},
  url          = {https://openreview.net/pdf?id=9rPyHyjfwP}
}

Star History Chart