protocaller / ProtoCaller

Full automation of relative protein-ligand binding free energy calculations in GROMACS
http://protocaller.readthedocs.io
GNU General Public License v3.0
43 stars 15 forks source link

The merge has opened/closed a ring #25

Closed kexul closed 3 years ago

kexul commented 3 years ago

Hi, my PDB ID was 4R1V, with a reference ligand ID 1401. Here were my target molecules:

mol1: Fc1cc(F)cc(c1)C2=NN(Cc3cccc(NC(=O)OCCCN4CCOCC4)c3)C(=O)C=C2 mol2: CN(C)CCCOC(=O)Nc1cccc(CN2N=C(C=CC2=O)c3ccc(F)cc3)c1 They looked almost the same in most part of the molecule, but there is a ring breaking in the side chain. image

When using ProtoCaller in my project, all of the perturbation containing ring open/close failed in mdrun: the system blow up with core dumped.

I tried to set allow_ring_breaking=False in the merge function of align molecule, the system generation stopped at BioSimSpace._Exceptions._exceptions.IncompatibleError: The merge has changed opened/closed a ring! If you want to allow this perturbation, then set the 'allow_ring_breaking' option to 'True'..

Are there some tricks to handle this kind of perturbation?

msuruzhon commented 3 years ago

Ring opening/closing can be highly problematic in GROMACS, as far as I am aware. To my knowledge, it can't handle well the change in exclusions and/or the constraints associated with that, meaning that even if the simulations don't crash, I am not sure I'd trust the result. A way to do it propertly using single topology can be found in this paper: https://doi.org/10.1021/acs.jctc.6b00991, and you can see that this hasn't been developed in GROMACS. What I would do to handle your system is to annihilate the whole ring, including the nitrogen, so that there are no problems with bond breaking. This means that you will have to run two perturbations to this smaller intermediate. While the convergence will probably not be ideal (it might be a good idea to invest some computational time in more intermediate lambda windows), I think this is the best practical solution for GROMACS nevertheless.

As a side note, I would be very careful with assigning all but the simplest ligands using 1D representations, since ProtoCaller parametrises before aligning, meaning that your charges will be dependent on the starting conformation. If you don't supply one, it is effectively random (albeit minimised), resulting in irreproducible charges. The reason why ProtoCaller parametrises before alignment is because the implemented alignment procedure can introduce strain, which may be problematic for parametrisation. That's why I would either create a a custom manually aligned SDF file, opt for a preliminary docking procedure, or manually call the RDKit routines to perform the alignment myself before Ligand initialisation (I am particularly talking about the alignTwoMolecules function in ProtoCaller.Wrappers.rdkitwrapper, which is the function called by the Perturbation routines anyway).

kexul commented 3 years ago

What I would do to handle your system is to annihilate the whole ring, including the nitrogen, so that there are no problems with bond breaking. This means that you will have to run two perturbations to this smaller intermediate.

If I understand correctly, you mean that I can do the following transformation, and then calculate the overall ?

perturbation

It sounds good, thanks for your suggestion.

As a side note, I would be very careful with assigning all but the simplest ligands using 1D representations.

Yeah, I used a conformer generated by docking. Thank you for your reminder.

msuruzhon commented 3 years ago

Yes, this is something along the lines of what I had in mind. I would just double check that the alcohol hydrogen is mapped to the carbon hydrogen and not the ring nitrogen, because otherwise we will run into the previous problem. The maximum common substructure algorithm should in most cases map a hydrogen to a hydrogen, but as far as I know there are no guarantees for that.

kexul commented 3 years ago

Hi, @msuruzhon, it seems like the thing you've worried about did happen. Here is my morph.gro generate by protocaller, the H was in the position which N atom used to be.

image

Here is my aligned molecule and morph.gro. attach.zip

Is there any way to avoid this? Such as use predefined atom mapping?

msuruzhon commented 3 years ago

Interesting, I was under the impression that RDKit handled that most of the time... Yes, you can use predefined atom mapping, by passing an mcs argument (a list of tuple pairs, indicating the atom indices starting from 0 according to the RDKit objects) to the alignToEachOther() function in Perturbation. Since doing this manually would be quite tedious, you can simply call the getMCSMap() function from ProtoCaller.rdkitwrapper and substitute the problematic tuple with a correct one. By doing this procedure, you are effectively short-circuiting ProtoCaller without breaking the workflow. You can read more about these functions on the relevant documentation pages here and here.

kexul commented 3 years ago

I checked the return value of getMCSMap(), plotted the mapping with rdkit , where the atom with the same color indicate the mapped atom pair, the H atom was indeed mapped to N atom. image I think that should be a bug of getMCSMap() ?

msuruzhon commented 3 years ago

For some reason I thought that the intermediate ligand had an amine at the end, but seeing now that it has a methyl group, then this is definitely intended behaviour, because the MCS algorithm will try to map as many atoms as possible, and this will include the hydrogen to nitrogen map. Otherwise you get a scenario where you have dummy atoms on both topologies and it isn't clear how to decouple this properly in GROMACS without the simulations crashing. You might have to cut the intermediate by another methyl group or work with a lower valency group instead.

kexul commented 3 years ago

Thanks for your clarification. I've tried cut another methyl group, unfortunately, my simulation fails in energy minimization step with:

step 11: One or more water molecules can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.
Wrote pdb files with previous and current coordinates
Segmentation fault

There might be something else I did not set correctly... I don't whether it's was related to the ring open/close issue. Anyway, thanks for your help so far.

kexul commented 3 years ago

I've found some warning in my grompp.err:


NOTE 1 [file morph.top, line 716]:
  State B has non-zero total charge: 1.000000
  Total charge should normally be an integer. See
  http://www.gromacs.org/Documentation/Floating_Point_Arithmetic
  for discussion on how close it should be to an integer.

WARNING 1 [file morph.top, line 716]:
  You are using Ewald electrostatics in a system with net charge. This can
  lead to severe artifacts, such as ions moving into regions with low
  dielectric, due to the uniform background charge. We suggest to
  neutralize your system with counter ions, possibly in combination with a
  physiological salt concentration.

As you can see in above image , the state B molecular has a 'N+' atom, would that be a big problem? I think the system should be neutralized after adding NaCl.

msuruzhon commented 3 years ago

ProtoCaller doesn't currently support charge perturbations, so you need to be very careful when doing that and in any case you will need to manually edit your topology to keep the charges neutral in both states. There have been some recently published papers comparing methods on how to do this, so there is more than one way to do it. In general, I would advise against perturbing a charge if possible. But as it stands right now, your current topology files will always give you wrong results.

kexul commented 3 years ago

it might be a good idea to invest some computational time in more intermediate lambda windows

Hi, @msuruzhon , Are there best practices for number of lambda windows to use for more challenging perturbations? i.e. Add or remove a ring.

I've done some research these days, people use different lambda windows in different softwares, in this paper, they use 5, 9, 14, 19 or 24 lambda windows for different perturbations. In your example, you had 40 lambda windows.

I've noticed that there are some protocols for minimization, equilibration and production. How about the lambda windows?

msuruzhon commented 3 years ago

There is no general recipe for this, and the more lambda windows you have, the better your chances of convergence are. However, by doing so, you are sacrificing the simulation length and therefore long-timescale dynamics, so there is always a tradeoff. The reason I use 40 lambda windows is because my HPC cluster has 40 cores per node, and I can run one lambda window per core in parallel.