microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.
MIT License
2k stars 324 forks source link

How to convert the molecular graph in SDF/JSON format to the input of graphormer? #108

Open mengmeng34 opened 2 years ago

mengmeng34 commented 2 years ago

Hello, the graph representation of the drug molecules I downloaded is in SDF format or JSON format. What kind of preprocessing do I need to do with them in order to use them as input to the graphormer? Looking forward to your reply.

zhengsx commented 2 years ago

One possible way is to convert SDF format to SMILES then using RDKit to generate molecular graph which is favored by Graphormer.

mengmeng34 commented 2 years ago

Thank you for your reply.

This is indeed a feasible method, but the process of converting SDF format to SMILES will inevitably lose some edge information. In order to achieve better results, I am considering whether I can extract the atom and edge information in SDF format by myself, and then write my own functions to calculate the data in batched_data, such as in-degree, out-degree, etc. I don't know which method is more feasible.

Looking forward to discussing with you.

zhengsx commented 2 years ago

What atom/edge information do you need? RDKit could provide basic features when processing the SMILES.

mengmeng34 commented 2 years ago

Sorry I don't know much about RDKit, I need to find more information to investigate the feasibility of this approach.