liugangcode / Graph-DiT

The code for "Graph Diffusion Transformer for Multi-Conditional Molecular Generation"
11 stars 0 forks source link

Problems with raw data format #2

Open lian-xiao opened 1 month ago

lian-xiao commented 1 month ago

It's an impressive piece of work. But I found that there is a smiles fingerprint format such as "=CC1CCC(C1)C=" in the original data format, what does the * stand for? How do you convert this to the normal smiles fingerprint format? thank you!

liugangcode commented 1 month ago

Thank you for your interest in our work. The symbol denotes the polymerization point in polymer data. In small molecule data, there is no .