ideaconsult / apps-ambit

Applications using AMBIT and examples how to call AMBIT modules
http://ideaconsult.github.com/apps-ambit
16 stars 5 forks source link

feature request: consistent rank-formatting in .sdf by Ambit‐Tautomer #11

Open nbehrnd opened 4 years ago

nbehrnd commented 4 years ago

I'm starting to use ambit-tautomer (ambit-tautomers-2.0.0-SNAPSHOT.jar, downloaded today). With O=C(c1c2cccc1)N([C@H](CCC(N1)=O)C1=O)C2=O provided in a .smi file, I run

java -jar ambit-tautomers-2.0.0-SNAPSHOT.jar -f thalomid.smi -o thalo.sdf

and read the results in DataWarrior. While browsing across the table, I noticed an inconsitent formatting of the rank order:

rank_order

and

rank_order_2

While I'm surprised about the occurence of negative rank orders, my first intent is to suggest a consistent formatting of this type of result; if it is a floating number, perhaps consistently to five decimals and -- so necessary -- with the addition of padding zeroes.

To ease replication of the observation, I add both start and resulting file; both padded by .txt to pass GitHub's settings. thalomid.smi.txt thalo.sdf.txt

ntk73 commented 4 years ago

The tautomer rank is an "energy based" rank (in eV) which tries to estimate the relative energy difference between tautomers (without heavy quantum mechanics calculations). The lower the energy rank the "better". Negative rank values are allowed and actually these are the most table tautomers according to out energy estimation rules.

nbehrnd commented 4 years ago

You are correct, my recall of section 4.6 Tautomer Ranking in Mol. Inf. 2013, 32, 481-504 was incomplete while posting the question about the negative values.  Where the publication clearly states «The tautomer with the lowest rank is expected to be the most stable one. [...] The more stable state has always score 0.0 eV and the alternative one is with a higher energy. Additionally every atom which is part of an aromatic system gets an aromatic correction coefficient C_arom = - 0.1 eV.» (pp. 491-492, loc. cit.) right next to figure 13 with an example of rank 0, 0.037 and then -0.432 for tautomers of a methimazole.

So it is only a comment suggesting consistently using three to five decimals about the rank in lieu of sometimes three, and sometimes a dozen.  Possibly the later, an almost «Fortranesque» accuracy of 12 decimals is rarely needed here. Thank you.