ncats / lychi

Layered Chemical Identifier
Apache License 2.0
14 stars 10 forks source link

lychi output #12

Open olegursu opened 10 years ago

olegursu commented 10 years ago

Hi all,

I have a question regarding the output, for example:

echo "C=CNCCC1=CC=CC=C1.Cl" | java -jar dist/lychi-all-v0.1.jar

produces 2 hash keys:

CC=NCCC1=CC=CC=C1 null N4765661Y-YGFUN3DUM8-Y8HGNSQ89JY-Y8Y5C3KXJ1U2 C\C=N\CCC1=CC=CC=C1 null N4765661Y-YGFUN3DUM8-Y8HGNSQ89JY-Y8Y6J8LJGS98

which one is suppose to be the canonical one?

caodac commented 10 years ago

Hi Oleg, this is a bug. The first one is the one that should be returned. I've just pushed upstream d9a1672 the fix. Please also note that we've also decided to turn off keto-enol by default. To turn it on, please use the following

echo "C=CNCCC1=CC=CC=C1.Cl" | java -Dketo-enol=true -jar dist/lychi-all-v0.1.jar
olegursu commented 10 years ago

Hi Trung,

When running structure A (amlodipine) through Lychi with keto-enol option on: echo "CCOC(=O)C1=C(COCCN)NC(C)=C(C1C1=C(Cl)C=CC=C1)C(=O)OC" | java -Dketo-enol=true -jar dist/lychi-all-v0.1.jar

I get structure (B) in smiles which can be a hypothetical tautomer, probably not real example

If I run both structures (A) and (B) I get the same hash key, which is not what I want, I guess what I am asking is it possible to have to separate levels for tautomerism handling (1) simple between neighbors atoms like keto-enol, amine-imine, lactam-lactim, etc. and (2) where long range is handled and leave the user the option to select either or both.

caodac commented 10 years ago

Ah yes, therein lies the danger of blindly allowing (long range) keto-enol. Perhaps the thing to do here, as in the case of salt/solvent removal, is to establish a list of "known" keto-enol tautomers and only allow tautomerism if it's known?

olegursu commented 10 years ago

I favor a solution based on rules, what format do you prefer SMIRKS, RXN, etc?

caodac commented 10 years ago

Sure, SMIRKS would be great, thanks!