schrodinger / coordgenlibs

Schrodinger-developed 2D Coordinate Generation
BSD 3-Clause "New" or "Revised" License
42 stars 28 forks source link

CRDGEN-255: Do not allow constrained fragments to flip #93

Closed rachelnwalker closed 3 years ago

rachelnwalker commented 3 years ago

This is my attempt (inspired by Paolo's PR) to prohibit flipping fragments that are in a given scaffold/template unless the flip would not affect constrained atoms. The problem I ran into when testing the previous changes was that some 1-2 atom fragments with exactly 1 constrained atom would not be considered constrained, allowing them to flip. Now, these specific fragments can only flip if they have no constrained children. Here is an example of where this seemed to help:

Original molecule and the scaffold:

Screen Shot 2021-04-22 at 12 23 15 PM Screen Shot 2021-04-22 at 12 25 35 PM

Generating coordinates with these changes:

Screen Shot 2021-04-22 at 12 26 41 PM

Now when generating coordinates, the 13-14 fragment is considered constrained because it has constrained children:

Screen Shot 2021-04-23 at 4 34 39 PM

I ran rdCoordGen.AddCoords on ~18k molecule/scaffold pairs (from the dataset Greg provided, ignoring instances where there was more than one scaffold match in the mol). After these changes, the distribution of the RMSD between the core atoms in the resulting molecule and the coordinates of the given scaffold seems to improve. Before, there were 712 instances where the RMSD > 0.5, and now there are 362 instances. Here is a more thorough comparison:

Screen Shot 2021-04-23 at 3 45 28 PM

I found that the majority of molecules that generated an RMSD larger than .5 included a ring with more than 6 atoms. The other instances mostly occurred when the given scaffold contained part of a ring (but not the entire ring). @d-b-w or @ZontaNicola, do you know where the flips/distortions on the larger rings are occurring, or what type of CoordgenFragmentDOF this change would come from? Here is an example:

Original molecule:

Screen Shot 2021-04-23 at 4 37 44 PM

Scaffold:

Screen Shot 2021-04-23 at 1 39 09 PM

Coordinates generated (same before and after this commit -- is this at all related to issue #81?):

Screen Shot 2021-04-23 at 4 37 56 PM

Finally, I think that @ZontaNicola's idea of assigning a large penalty to flipping these constrained fragments would be a better alternative to these changes once we can determine what needs to be penalized in the macrocycle cases. Let me know if you have suggestions on what sort of penalty we should give.

rachelnwalker commented 3 years ago

I added a constrainedFlip field to all fragments, and used this to determine whether to penalize a flip in CoordgenFlipFragmentDOF::getCurrentPenalty. I tested this on the molecules above and the molecule mentioned here, and got the expected output each time (the examples above still hold). Here is an updated RMSD distribution:

rmsd_hists

I think that these changes address part of the CRDGEN-255, but not all of it. As I mentioned in my original description, there are still potential issues when there is more than one substructure match or when there are very large rings.

rachelnwalker commented 3 years ago

Added a test that checks whether fragments have constrained or constrainedFlip as expected.