zjunlp / MolGen

[ICLR 2024] Domain-Agnostic Molecular Generation with Chemical Feedback
https://huggingface.co/spaces/zjunlp/MolGen
MIT License
129 stars 11 forks source link

About the molecule generation with optimized chemical properties #1

Closed songyinys closed 1 year ago

songyinys commented 1 year ago

Hi,

I recently came across your work "Molecular Language Model as Multi-task Generator" and found it very interesting. I have a question regarding molecule generation with optimized chemical properties.

I noticed that in your code, the QED and PlogP features are calculated using RDKit, which relies on SMILES strings as input. However, I was wondering if it is possible to use other chemical features that do not rely on SMILES strings to achieve the generative task.

For example, if I have a dataset with SMILES and their chemical properties (which are experimental data), would it be possible to use your model to achieve the generative task?

I am very interested in exploring this idea further and would appreciate any insights or suggestions you may have. Thank you so much for your time.

Best regards, Song

ZJU-Fangyin commented 1 year ago

Dear Song,

Thank you for your attention to our work! I'm not sure if I understand your meaning correctly, but do you want to input the chemical properties of molecules as features into our model, and use our model to generate new molecules (also represented by these chemical properties)?

If that's the case, I'm sorry to say that our model doesn't support such an operation. Our model is based on a specific chemical language called SELFIES, and only supports input and output in the form of SELFIES. If you want to use chemical properties as features of molecules, you need to train a new model specifically for that feature.

Additionally, it's worth considering that currently, most mainstream uses of molecules involve calculating various properties of SMILES using RDKit or converting them into 2D molecular graphs. If you adopt chemical properties as features of molecules, you may encounter difficulties in determining what molecule you are generating.

Sincerely, Yin Fang

songyinys commented 1 year ago

Dear Yin,

I apologize for the confusion that may have arisen in my previous comment. I think you are correct. If I want to generate/optimize molecules based on certain chemical features, I must train a new model specifically for that feature. But as you said, these additional chemical features may also confuse the model when generating new molecules. Nonetheless, I think your pre-trained model is still great progress to generate/optimize molecules that are easier to synthesize and more likely to be drug candidates based on logP score and QED. And thank you for your valuable insights.

Sincerely, Song

ZJU-Fangyin commented 1 year ago

Dear Song,

I am glad we are on the same page regarding this. Thank you again for your attention to our work.☺️ Please feel free to let me know if you have any further questions.

Best wishes, Yin Fang