mojaie / MolecularGraph.jl

Graph-based molecule modeling toolkit for cheminformatics
MIT License
189 stars 27 forks source link

InChI layers missing from Mol object #107

Open timoleistner opened 5 months ago

timoleistner commented 5 months ago

When converting an InChI string to a mol object and then convert the mol object back to InChI, some of the last layers are missing.

ascorbicacid_inchi = "InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1"
inchi(inchitomol(ascorbicacid_inchi))

InChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2

Trying this with the corresponding molblock works correctly, which leads me to believe that inchi() works correctly but can't extract this layer information from the molecular graph object.

inchi("54670067
  -OEChem-02072408212D

 20 20  0     1  0  0  0  0  0999 V2000
    5.0298   -0.5357    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.3548    1.5521    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.4608   -0.2266    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    5.0868    2.5521    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.1330   -2.2957    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    5.3086   -2.2957    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.2208    0.0521    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    4.2208    1.0521    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    3.4118   -0.5357    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.0868    1.5521    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7208   -1.4867    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.7208   -1.4867    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9782    0.4380    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2208    1.6721    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.2989    0.9695    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.6974    1.6598    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.3548    2.1721    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000   -0.6415    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.6238    2.8621    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.3852   -2.8621    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  7  1  0  0  0  0
  1 12  1  0  0  0  0
  8  2  1  1  0  0  0
  2 17  1  0  0  0  0
  3  9  1  0  0  0  0
  3 18  1  0  0  0  0
  4 10  1  0  0  0  0
  4 19  1  0  0  0  0
  5 11  1  0  0  0  0
  5 20  1  0  0  0  0
  6 12  2  0  0  0  0
  7  8  1  0  0  0  0
  7  9  1  0  0  0  0
  7 13  1  6  0  0  0
  8 10  1  0  0  0  0
  8 14  1  0  0  0  0
  9 11  2  0  0  0  0
 10 15  1  0  0  0  0
 10 16  1  0  0  0  0
 11 12  1  0  0  0  0
M  END
")

InChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1

This can be dangerous if structures are only available via sdf files as sdfilereader() only yields molecule objects. Doing inchi.(sdfilereader(file)) parses incorrect/incomplete InChIs.

mojaie commented 4 months ago

Thank you for the comment. inchitomol uses only InChI functions without going through MolGraph objects. Something may be wrong with inchitosdf inside inchitomol. It does not generate coordinates for stereochemistry.

ascorbicacid_inchi = "InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1"
println(inchitosdf(ascorbicacid_inchi))

Output:

Structure #1. 
  InChIV10                                     

 14 14  0  0  0  0  0  0  0  0  1 V2000
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0     0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  7  1  0  0  0  0
  2 13  1  0  0  0  0
  2  5  1  0  0  0  0
  2  8  1  0  0  0  0
  3  4  2  0  0  0  0
  3  5  1  0  0  0  0
  3  9  1  0  0  0  0
  4  6  1  0  0  0  0
  4 10  1  0  0  0  0
  5 14  1  0  0  0  0
  5 12  1  0  0  0  0
  6 11  2  0  0  0  0
  6 12  1  0  0  0  0
M  END
$$$$