ncats / lychi

Layered Chemical Identifier
Apache License 2.0
14 stars 10 forks source link

fix with some tests for issue #4, #3, #7, needs evaluation #17

Closed tylerperyea closed 5 years ago

tylerperyea commented 5 years ago

This needs evaluation but is mostly a fix for issue #4 concerning meaningless stereo. This is quite similar to the strategy proposed in the issue. Specifically, if there is a non-R/S assignable center which has a wedge/dash bond, it will do the following:

  1. Collect all small rings which contain the atom(s) in question
  2. Find all single bonds "leaving" the ring (which are not already R/S assigned)
  3. Generate all possible stereoisomers for each wedge/dash configuration of the exo-cyclic bond
  4. For each, calculate a lower-cost hash
  5. If that lower-cost hash was generated with a configuration compatible with the assigned wedge/dashes, mark it as possible.
  6. If there are any lower-cost hashes generated which are not marked as possible, the bonds should be kept with the wedge/dash configuration given. If all hashes are possible with the specified configuration, mark the bonds as standard non-stereo single bonds.

To put it more simply, the code is looking for what's possible, abstractly, if the stereocenters were not given at all. It then does the same thing while "locking" the specified centers as they are specified. If there is no change in the set of possibilities, the specified centers are meaningless.

There are likely edge cases where this doesn't work. In particular, there may be cases with 0-dimensional input or intra-cyclic bonds in a molfile which may confuse it. The JUnit tests only add cases for those mentioned in the GitHub issue and a few other obvious analogues.

The rest of the tests are mostly confirming that the other issues mentioned above are actually fixed (which they appear to be).

I'd also like some better test sets to confirm that major parts of the hash are still working as expected.