Open adamoyoung opened 2 years ago
I'm not sure. If these are true isomers, then yes. But my brief glance at org.openscience.cdk.isomorphism.UniversalIsomorphismTester
suggests that this class also checks for sub-molecules, in which case the fragments will be different.
Thanks for the response! I was using UniversalIsomorphismTester.isIsomorph()
to check if the two strings are the same. According to [the documentation](https://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/isomorphism/UniversalIsomorphismTester.html#UniversalIsomorphismTester()), I think this should tell me if the resulting molecules have the same atoms and bonds.
One of the example strings that you give in test_input.csv
(line 2) is caffeine (Cn1cnc2n(C)c(=O)n(C)c(=O)c12
). I tried re-canonicalizing caffeine in cdk using the following code:
final SmilesParser parser = new SmilesParser(SilentChemObjectBuilder.getInstance());
final IAtomContainer molecule = parser.parseSmiles(smiles);
final SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Unique | SmiFlavor.UseAromaticSymbols);
final String newSmiles = smigen.create(molecule);
This gave me the new string O=c1c2c(ncn2C)n(c(=O)n1C)C
.
When I tried this new string with FragGenie, I got significantly different results! I tried debugging myself but I was struggling a bit. It might be that this is how FragGenie is supposed to work, I'm not sure.
To reproduce the bug (?) try running test.sh
where test_input.csv
has the following lines:
smiles
Cn1cnc2n(C)c(=O)n(C)c(=O)c12
O=c1c2c(ncn2C)n(c(=O)n1C)C
This should give you the following result:
smiles,METFRAG_MZ
Cn1cnc2n(C)c(=O)n(C)c(=O)c12,"[86.02366, 87.055305, 94.01616, 95.047806, 100.026726, 100.02673, 115.05022, 123.04272, 150.0172, 151.03763, 152.06927, 180.06418, 195.08766]"
O=c1c2c(ncn2C)n(c(=O)n1C)C,"[94.01615, 95.047806, 100.026726, 123.04271, 123.04272, 150.0172, 152.06926, 152.06927, 165.04068, 195.08765]"
Just as a sanity check, I used PubChem to confirm that these strings are indeed the same (so it's not just cdk being weird):
Cn1cnc2n(C)c(=O)n(C)c(=O)c12
: https://pubchem.ncbi.nlm.nih.gov/#query=Cn1cnc2n(C)c(%3DO)n(C)c(%3DO)c12
O=c1c2c(ncn2C)n(c(=O)n1C)C
: https://pubchem.ncbi.nlm.nih.gov/#query=O%3Dc1c2c(ncn2C)n(c(%3DO)n1C)C
For what it's worth, Pubchem sketcher also thinks they are the same
If I have two molecules that are isomers according to
org.openscience.cdk.isomorphism.UniversalIsomorphismTester
, should they produce the same fragments?