plinder-org / plinder

Protein Ligand INteraction Dataset and Evaluation Resource
https://plinder.sh
Apache License 2.0
163 stars 9 forks source link

question about canonical smiles #83

Closed AliSaadatV closed 1 week ago

AliSaadatV commented 1 week ago

Hello

Thanks for the great resource.

I found some non-standard smiles in the "ligand_rdkit_canonical_smiles" (please see attached image). I think canonical smiles are not supposed to be like this. I was wondering if I need to use "ligand_resolved_simles" instead? Is there some annotation that I can use to detect such non-standard smiles?

Thank you in advance Ali strange_smiles

OleinikovasV commented 1 week ago

@AliSaadatV , the rdkit_canonical_smiles are exactly the smiles that are canonical to rdkit software, and are output by Chem.CanonSmiles: https://github.com/plinder-org/plinder/blob/main/src/plinder/data/utils/annotations/ligand_utils.py#L866

Regarding what you think the canonical smiles suppose to be like - may I refer you to this comment by Greg Landrum: https://github.com/rdkit/rdkit/issues/2747#issuecomment-547827421

Unless there are strong reasons why community needs multiple "canons" to be supported, we intend leave this to the user to make the conversion into their preferred standards. Feel free to join the P(L)INDER user group Discord Server if you would like to discuss this or other ideas relating to the dataset further.