schrodinger / coordgenlibs

Schrodinger-developed 2D Coordinate Generation
BSD 3-Clause "New" or "Revised" License
42 stars 28 forks source link

How do we properly convey chirality & stereochemistry? #120

Open shubbey opened 1 year ago

shubbey commented 1 year ago

Hello!

I am using coordgenlibs in order to draw a known 3D structure in 2D. Prior to converting to 2D, I establish the R/S and Z/E markers on the molecule. However, it is unclear how to pass this information on to coordgen. I have tried two approaches. First, the obvious, which is setting, for chiral atoms, hasStereochemistrySet = true and isR = true/false; and for stereo bonds, isZEActive/isZ. When this didn't do as I expected, I tried using the lower-level routines setStereoChemistry(my chirality info)/setAbsoluteStereoChemsitryFromChiralityInfo() and corresponding for bonds. The latter reasoning being that perhaps just saying something is R/S/Z/E isn't enough given that we would have to have a matching CIP rule system. At any rate, neither of these methods did what I expected.

As an example, I'm having trouble with this molecule: VS248

which can be visualized in 3D here: https://chemapps.stolaf.edu/jmol/jsmol/cip.htm?load=https://cipvalidationsuite.github.io/ValidationSuite/mol3d/VS248.sdf

When I convert this to 2D using coordgenlibs, it expects a dash, not a wedge, to be present in the C4-C6 bond, in order to preserve the S chirality. The reason this happens is that it switches the C1 and C2 ring atoms, which have a different CIP priority order. It is not clear to me how to properly convey this information so that it draws the atoms in the same relative positions.

I am probably not explaining this very well, but essentially I am looking for some instruction on how one is supposed to pass on the chirality information of a molecule, eg if you were reading from full smiles or a 3D coordinate structure. Thanks for any help you can give me!

ricrogz commented 1 year ago

Hi @shubbey. I'm sorry, the problem you are seeing is a known issue in coordgen, and we don't have a workaround for it.

The problem is happening because coordgen internally uses a very simple stereochemistry algorithm, based on absolute stereochemistry information. This algorithm is much too simple to recognize the co-dependency between the two stereo centers in your structure, and any indication about the stereochemistry around these centres is just ignored.

Fixing this issue would require an important rework of coordgenlibs. Since we are currently working on contributing a improved template-based coordinate generation algorithm to RDKit, and are planning to deprecate coordgenlibs, we don't have plans to fix this issue in coordgenlibs.

d-b-w commented 1 year ago

@shubbey - Where are you using coordgen?

shubbey commented 1 year ago

Ok, thanks. That's disappointing, but I understand. Perhaps I will dig under the hood to see if I can come up with a workaround. @d-b-w This is part of an exploratory side project to improve the 2D display for Spartan (I work for Wavefunction in Irvine). Many years ago I wrote the method we currently use, which is based off reorienting and "flattening" the 3D version of the structure using various force fields, but it's a little bit clunky so I wanted to survey what was out there. I have also experimented with RDKit which is what led me to CoordGen, although currently we are not using RDKit inside Spartan (cool toolkit, though!)

d-b-w commented 1 year ago

You are welcome to submit PRs! I think the solution to this is to mostly remove coordgen's internal detection of stereochemistry and rely on the caller's detection.

We weren't able to do that before because of how this is used internally, but it is something I'd be open to now.

ricrogz commented 1 year ago

Yeah, I also think the best solution would be patching coordgenlibs to use relative stereo information (atom parities around chiral atoms, and cis/trans specifications for double bonds) instead of absolute info (R/S and Z/E). This would allow the removal of all the stereochemistry detection code, as @d-b-w suggested, so that the input's stereo information can be honored without further checks.

shubbey commented 1 year ago

I will play with this a bit and see what I can learn. Thanks all.

shubbey commented 1 year ago

Hey all, I finally had some time to look into this, but I could use some guidance. I noticed that there are a bunch of variables and functions related to stereo that don't seem to be actually called or used in the public code. I'm assuming this stuff is used internally. For example, it's not clear to me what "isR" actually does, since can be set and retrieved but isn't actually used (AFAIK) in any coordinate assignment. I feel like I'm missing the big picture here. Is there any method that the external caller can use to reorient the R/S (or whatever we want to call the priority order) that will actually affect how the atoms are laid out in 2D? And if not, can you point me in the right direction? Thanks for any help you can provide.

[Edit] I now see that isR is used to set the isWedge parameter which presumably indicates to the end user to draw a wedge/dash. However, as you noted above, this won't work when there are stereocenters with dependencies on one another. In that case, the atom positions may need to be swapped (as in the example above). I'm seeing if there is some simple way I can pass that info through.

shubbey commented 1 year ago

Just FYI, I believe I was able to get this working, at least for the simple case I posted earlier. I realized I was focusing on the wrong issue by being concerned with R/S (since that can easily be toggled by the wedge). The problem here was that coordgenlibs didn't realize that the double bond had stereo (or pseudo-stereo). So by explicitly setting that stereo marker and overriding sketcherMinimizerAtom::CIPPriority() to use my ranks, it reversed the orientation of the Br vs C branch and the chirality matched the 3D depiction.

Which is basically what you told me to do. I'll do some more testing when time permits. Thanks for the help.

ricrogz commented 1 year ago

That sounds great, @shubbey. Would you mind posting a PR with your changes?

shubbey commented 1 year ago

I will see what I can do to make a cleanup of this in cases where one knows the stereochemistry. The only changes I made were:

sketcherMinimizer.cpp: // remove call to writeStereoChemistry() in runGenerateCoordinates()

sketcherMinimizerAtom.h: // add m_cipRank. default 0. The high-ranking neighbor in a Z/E situation will be set to 1

sketcherMinimizerAtom.cpp // replace CIPPriority() with the below

sketcherMinimizerAtom*
sketcherMinimizerAtom::CIPPriority(sketcherMinimizerAtom* at1,
                                   sketcherMinimizerAtom* at2,
                                   sketcherMinimizerAtom* center)
{
    if (at1->m_cipRank > at2->m_cipRank)
        return at1;
    if (at2->m_cipRank > at1->m_cipRank)
        return at2;
    return nullptr;

sketcherMinimizerBond.cpp: replace isStereo() to assume user has provided the info:

bool sketcherMinimizerBond::isStereo() const
{
    return !m_ignoreZE;
}

And then in calling all I do is pass in the ranks on the ends of the Z/E bonds:

for (bonds):
    ChiralFlag ez =  {my e/z status}
    if ez != NONE:
        pSketchBond->m_ignoreZE = false
    pSketchBond->isZ = {ez is Z}
    // set ranks of two ends of the bond, as determined by our external routine
        pSketchAtom = {get top ranked atom on end 1 of pSketchBond}
        pSketchAtom->m_cipRank = 1
    pSketchAtom = {get top ranked atom on end 2 of pSketchBond}
        pSketchAtom->m_cipRank = 1

This has the effect of properly orienting the Br/C in the above example since it knows that the structure should have a "Z" orientation.

Note that I am not doing anything similar for R/S although it would probably more elegant to do that on the coordgen side (that is, pass in the known chirality and the bond to wedge, and have coordgen decide whether it should be a dash or a wedge). Instead, I handle that externally, temporarily projecting the wedged bond out of the plane and computing the 2D R/S for the chiral center. If that doesn't match the 3D R/S, I simply flip the wedge to a dash which will by definition reverse the chirality. Note that this doesn't always result in the best picture (although in the cases I've seen it is mostly ok). It is paramount that the 2D and 3D chirality remain the same so that when moving from the 2D picture back to 3D we don't break chirality.

I will play with this more as time permits, and if possible will try to submit a proper PR in case I am able to make any real improvements. It is conceivable that my efforts here are just hacks and that there are still outstanding cases as I haven't had the opportunity to test this much. I do think that it is superior to our current in-house model and we will probably use it moving forward. Thanks!