ncats / lychi

Layered Chemical Identifier
Apache License 2.0
14 stars 10 forks source link

Round-trip Disagreement #3

Open tylerperyea opened 10 years ago

tylerperyea commented 10 years ago

Some structures (usually with several fused rings that contain stereo annotations) don't return the same hash after a round-trip. I'm not sure why this happens.

Example:

[H][C@@]12[C@@H]3SC[C@]4(NCCC5=C4C=C(OC)C(O)=C5)C(=O)OC[C@H](N1[C@@H](O)[C@@H]6CC7=C([C@H]2N6C)C(O)=C(OC)C(C)=C7)C8=C9OCOC9=C(C)C(OC(C)=O)=C38

Yeilds:

DCLRH149F-FFMPLZ16VC-FC1Y2MQMGXU-FCUZ42LBF8VB

But, if the output file is fed through again, I get:

DCLRH149F-FFMPLZ16VC-FC1Y2MQMGXU-FCUZ3C1UCNTD

Each new loop seems to agree with the last hash. This may be due to parity conflict resolution, which seems to be done arbitrarily. If there is ambiguity/conflict, it would probably be better to err on the side of no annotation. However, I think this example does contain enough information to work.

Similarly, this happens with the following (theoretically equivalent) molfile:


  Symyx   02191314562D 1   1.00000     0.00000     0

 55 63  0     1  0            999 V2000
    5.2579  -10.3796    0.0000 N   0  0  3  0  0  0           0  0  0
    5.2643  -11.3463    0.0000 C   0  0  2  0  0  0           0  0  0
    6.9059  -11.3355    0.0000 C   0  0  0  0  0  0           0  0  0
    6.8995  -10.3688    0.0000 C   0  0  0  0  0  0           0  0  0
    3.7293  -11.1063    0.0000 N   0  0  3  0  0  0           0  0  0
    4.4671  -12.0064    0.0000 C   0  0  2  0  0  0           0  0  0
    6.3356   -7.6267    0.0000 C   0  0  1  0  0  0           0  0  0
    3.2045  -12.4015    0.0000 C   0  0  0  0  0  0           0  0  0
    4.4093   -9.9560    0.0000 C   0  0  2  0  0  0           0  0  0
    6.0714  -11.7993    0.0000 C   0  0  1  0  0  0           0  0  0
    3.4158  -10.1363    0.0000 C   0  0  2  0  0  0           0  0  0
    6.0592   -9.9452    0.0000 C   0  0  2  0  0  0           0  0  0
    7.7339  -11.7884    0.0000 C   0  0  0  0  0  0           0  0  0
    7.7217   -9.9342    0.0000 C   0  0  0  0  0  0           0  0  0
    3.2108  -13.3557    0.0000 C   0  0  0  0  0  0           0  0  0
    7.0154   -7.0763    0.0000 C   0  0  0  0  0  0           0  0  0
    8.5767  -11.3245    0.0000 C   0  0  0  0  0  0           0  0  0
    8.5703  -10.3578    0.0000 C   0  0  0  0  0  0           0  0  0
    2.3766  -11.9653    0.0000 C   0  0  0  0  0  0           0  0  0
    5.4048   -8.0161    0.0000 C   0  0  0  0  0  0           0  0  0
    2.3695  -10.8862    0.0000 C   0  0  0  0  0  0           0  0  0
    2.4719  -13.7855    0.0000 C   0  0  0  0  0  0           0  0  0
    7.1803   -8.7253    0.0000 C   0  0  0  0  0  0           0  0  0
    7.7977   -7.5504    0.0000 C   0  0  0  0  0  0           0  0  0
    6.1550   -8.6695    0.0000 O   0  0  0  0  0  0           0  0  0
    7.0097   -6.2138    0.0000 C   0  0  0  0  0  0           0  0  0
    5.7261   -9.3390    0.0000 C   0  0  0  0  0  0           0  0  0
    5.5820   -7.0857    0.0000 N   0  0  0  0  0  0           0  0  0
    1.6316  -13.3661    0.0000 C   0  0  0  0  0  0           0  0  0
    7.7492  -12.8549    0.0000 O   0  0  0  0  0  0           0  0  0
    1.6254  -12.4119    0.0000 C   0  0  0  0  0  0           0  0  0
    8.5362   -7.0663    0.0000 C   0  0  0  0  0  0           0  0  0
    7.9612   -8.9785    0.0000 O   0  0  0  0  0  0           0  0  0
    7.7858   -5.7420    0.0000 C   0  0  0  0  0  0           0  0  0
    8.5302   -6.1580    0.0000 C   0  0  0  0  0  0           0  0  0
    9.4749   -9.7810    0.0000 O   0  0  0  0  0  0           0  0  0
    8.6620  -13.5281    0.0000 C   0  0  0  0  0  0           0  0  0
    8.9890   -8.7634    0.0000 C   0  0  0  0  0  0           0  0  0
    4.5516   -8.1550    0.0000 O   0  0  0  0  0  0           0  0  0
    8.6687  -14.5489    0.0000 O   0  0  0  0  0  0           0  0  0
    4.0768  -13.8833    0.0000 O   0  0  0  0  0  0           0  0  0
    3.1995  -11.6390    0.0000 C   0  0  0  0  0  0           0  0  0
    2.4776  -14.6439    0.0000 O   0  0  0  0  0  0           0  0  0
    5.5763   -6.2233    0.0000 C   0  0  0  0  0  0           0  0  0
    9.4200   -5.8813    0.0000 O   0  0  0  0  0  0           0  0  0
    9.6512  -11.8924    0.0000 C   0  0  0  0  0  0           0  0  0
    9.0177   -7.4215    0.0000 O   0  0  0  0  0  0           0  0  0
    6.2861   -5.8061    0.0000 C   0  0  0  0  0  0           0  0  0
    0.9270  -13.9374    0.0000 C   0  0  0  0  0  0           0  0  0
    9.5507  -13.0639    0.0000 C   0  0  0  0  0  0           0  0  0
    1.6559  -15.1576    0.0000 C   0  0  0  0  0  0           0  0  0
    9.7288   -7.2168    0.0000 C   0  0  0  0  0  0           0  0  0
    6.0443  -12.7495    0.0000 S   0  0  0  0  0  0           0  0  0
    5.2569  -12.1255    0.0000 H   0  0  0  0  0  0           0  0  0
    4.4042   -9.0125    0.0000 O   0  0  0  0  0  0           0  0  0
  2  1  1  0     0  0
  3  4  2  0     0  0
  4 12  1  0     0  0
 11  5  1  6     0  0
  6  2  1  0     0  0
  7 20  1  6     0  0
  8  6  1  0     0  0
  9  1  1  0     0  0
 10  2  1  0     0  0
 11  9  1  0     0  0
 12  1  1  0     0  0
 13  3  1  0     0  0
 14  4  1  0     0  0
 15  8  2  0     0  0
 16  7  1  0     0  0
 17 18  1  0     0  0
 18 14  2  0     0  0
 19 21  1  0     0  0
 20 25  1  0     0  0
 21 11  1  0     0  0
 22 15  1  0     0  0
 24 16  2  0     0  0
 25 27  1  0     0  0
 26 16  1  0     0  0
 12 27  1  1     0  0
 28  7  1  0     0  0
 29 31  1  0     0  0
 30 13  1  0     0  0
 31 19  2  0     0  0
 32 24  1  0     0  0
 33 14  1  0     0  0
 34 26  2  0     0  0
 35 34  1  0     0  0
 36 18  1  0     0  0
 37 30  1  0     0  0
 38 33  1  0     0  0
 39 20  2  0     0  0
 40 37  2  0     0  0
 41 15  1  0     0  0
 42  5  1  0     0  0
 43 22  1  0     0  0
 44 28  1  0     0  0
 45 35  1  0     0  0
 46 17  1  0     0  0
 47 32  1  0     0  0
 48 44  1  0     0  0
 49 29  1  0     0  0
 50 37  1  0     0  0
 51 43  1  0     0  0
 52 47  1  0     0  0
 10 53  1  1     0  0
  2 54  1  6     0  0
 10  3  1  0     0  0
  6  5  1  6     0  0
 19  8  1  0     0  0
  7 23  1  1     0  0
 38 36  1  0     0  0
 17 13  2  0     0  0
 29 22  2  0     0  0
 48 26  1  0     0  0
 35 32  2  0     0  0
 53 23  1  0     0  0
  9 55  1  6     0  0
M  END

Which gets: java -jar lychi-all-v0.1.jar test.mol

DCLRH149F-FFMPLZ16VC-FC1Y2MQMGXU-FCUS96TNY5ZD

java -jar lychi-all-v0.1.jar test.mol | java -jar lychi-all-v0.1.jar

DCLRH149F-FFMPLZ16VC-FC1Y2MQMGXU-FCUZ3C1UCNTD
tylerperyea commented 10 years ago

The simple, poor man's resolution to this, of course, is to simply feed the output of the standardizer back into the standardized, until it stops changing. I'm not yet aware of infinite oscillating hashes, but if they exist, such a procedure could bail out and notify the user of an error...

caodac commented 10 years ago

This is fixed as of commit 9b38dbdd97da53e3a4a5e3ba458f7d6784df07e2. I've also reworked in how we handle stereocenters with explicit Hs.