neilswainston / FragGenie

MIT License
15 stars 8 forks source link

SMILES parsing errors #2

Closed adamoyoung closed 2 years ago

adamoyoung commented 2 years ago

I installed maven 3.6.0 and ran test.sh. However, 3 out of 5 SMILES strings in test_input.csv don't seem to work.

image

I've included the output of the program: test_output.csv

I don't think this is the desired outcome, any tips? Am I perhaps using the wrong version of CDK?

neilswainston commented 2 years ago

Huge apologies for the delayed response.

This is actually a test to ensure that the script can deal with invalid input without breaking. Please take a look at the test_output.csv file to ensure that the first and last SMILES string does have associated fragments. And then update the test_input.csv file with the SMILES of your choice.

adamoyoung commented 2 years ago

Thanks! Do you know how this code behaves with different kinds of string canonicalizations? I am trying to use it with SMILES strings that were exported using RDKit (in Python), without any stereochemical information. However, I've noticed some inconsistencies in the FragGenie's output based on how the strings are canonicalized. I can get into specifics but it might be more appropriate to open another issue.

Did you only test with CDK-generated SMILES strings?

neilswainston commented 2 years ago

I didn’t actively limit this to CDK generated SMILES. I’ve seen some anomalies, but given that we were looking at thousands of chemicals, I was (kinda sloppily) not fazed by seeing some fail gracefully.

On 22 Feb 2022, at 20:19, Adamo Young @.***> wrote:

 Thanks! Do you know how this code behaves with different kinds of string canonicalizations? I am trying to use it with SMILES strings that were exported using RDKit (in Python), without any stereochemical information. However, I've noticed some inconsistencies in the FragGenie's output based on how the strings are canonicalized. I can get into specifics but it might be more appropriate to open another issue.

Did you only test with CDK-generated SMILES strings?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

--

This e-mail and any attachments may contain confidential information. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal.

adamoyoung commented 2 years ago

Thanks!