michellab / EspalomaFFComparison

0 stars 0 forks source link

Scripts used for espaloma parameterization and export to gromacs format #1

Open amin-sagar opened 3 weeks ago

amin-sagar commented 3 weeks ago

Hello.

This is awesome work. I have also been working on this. I have run some tests with the espaloma approach using the method described here. https://github.com/choderalab/espaloma/issues/192 Me and others were not sure about how to save the espaloma parameterized files to amber and gromacs formats https://github.com/choderalab/espaloma/issues/145 and https://github.com/openforcefield/openff-toolkit/issues/1824

Can you please share how you parameterize macrocycles and export them to gromacs format?

Also, your macrocycle seems to have non-canonical amino acids which is also my case. What approach have you used to generate amber parameters for these to compare with espaloma?

Thanks, Amin.

akalpokas commented 3 weeks ago

Hi there,

We used interchange to convert an OpenMM system into GROMACS input files, and then used BioSimSpace to load those input files and do the rest of the work (such as preparing an Amber system and setting up the simulations). So the strategy here is essentially to create an OpenMM system and prepare it to be exported. In our case, the code was as follows:

from rdkit import Chem
from openff.toolkit.topology import Molecule
import openmm.unit as unit
import openmm.app as app
from openmm import Vec3

from openmm.app import *
from openmm import *

Create an espaloma forcefield generator object

from openmmforcefields.generators import EspalomaTemplateGenerator
from openmm.app import ForceField
espaloma = EspalomaTemplateGenerator(molecules=molecule, forcefield='espaloma-0.3.2')

Load input file and create an OpenMM system

topology = molecule.to_topology()
omm_topology = topology.to_openmm()
suppl = Chem.SDMolSupplier(compound_path)
mols = [x for x in suppl]
mol = mols[0]
mol.SetProp("_Name", "MOL")
offmol = Molecule.from_rdkit(mol, allow_undefined_stereo=True)
ligand_positions = offmol.conformers[0]
lig_pos = ligand_positions.to_openmm()

forcefield = ForceField('amber/protein.ff14SB.xml', 'amber/tip3p_standard.xml')
forcefield = ForceField('tip3p.xml')
# Register the SMIRNOFF template generator
forcefield.registerTemplateGenerator(espaloma.generator)

modeller = Modeller(omm_topology, lig_pos.in_units_of(unit.nanometer))

Invoke Espaloma to parametrise the ligand

modeller.topology.setUnitCellDimensions(Vec3(3.0, 3.0, 3.0))
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME)

Create an interchange object and export the OpenMM system

%env INTERCHANGE_EXPERIMENTAL=1
from openff.interchange import Interchange

# doesn't work
#interchange = Interchange.from_openmm(topology=modeller.topology, system=system, positions=modeller.positions, box_vectors=modeller.topology.getPeriodicBoxVectors())

# works
interchange = Interchange.from_openmm(topology=omm_topology, system=system, positions=lig_pos.in_units_of(unit.nanometer), box_vectors=modeller.topology.getPeriodicBoxVectors())
interchange.to_gro("compound.gro")
interchange.to_top("compound.top")

For parametrisation with Amber, we followed this tutorial: https://ambermd.org/tutorials/basic/tutorial5/

Hope this helps!

amin-sagar commented 2 weeks ago

Thanks @akalpokas This is really helpful.

If I understand correctly, this means you add the receptor protein, water box and ions using BioSimSpace. This seems like a great approach as the major problem I had was handling rigid water when exporting using Parmed.

I have a couple of questions: 1) Are there any tutorials on how to do this with BioSimSpace? I could only find 3 tutorials on the BioSimSpace website.

2) Did you ever need to convert your top file to itp file for providing the parameters of the macrocycle to BioSimSpace?

3) Do you think the differences between Amber and Espaloma could originate from the Amber parameterization process? And doing a more detailed RESP fit, etc could move the Amber results towards Espaloma?

I really appreciate your help and suggestions. Best, Amin.

akalpokas commented 2 weeks ago
  1. Yes, the full BioSimSpace tutorial suite is available here and comes with a published version here too. You can also use the demo server to test out the software without having to install it here.
  2. If the top file contains all of the information needed to describe the macrocycle (which was the case for us) then BioSimSpace will be able to load it properly and the itp file won't be needed.
  3. We are currently investigating the source of the discrepancy between the two parametrisation approaches, so it's hard to say for sure at this point. We're actually aiming to move Espaloma results to the results obtained with Amber forcefield, rather than the other way around, because in our case it was molecule parametrised by Espaloma which was not being modelled accurately.

I hope this helps with your work!

amin-sagar commented 1 week ago

Thanks @akalpokas

I understand that the major source of discrepancy is very likely to be espaloma. I was just wondering if amber could also move slightly towards espaloma so that the truth lies quite close to Amber but slightly shifted in the espaloma direction.

I used your files to run simulations in openMM to see if I could reproduce your results without the conversion to gromacs format. Here are the results for RMSD of heavy atoms Phe from three 1 microsecond simulations. Phe_RMSD_espaloma

I don't see the bimodal distribution but the results are qualitatively similar to yours. The RMSD values that I have calculated are after superimposing the receptor protein. Did you also do the same or did you superimpose the macrocycle or the whole system?

Thanks again for your help. I intend to run the optimized benchmark on 25-50 protein-macrocycle complexes with both espaloma and Amber, with and without non-canonical amino acids to understand the origin and extent of discrepancy between the two.

Best, Amin.

akalpokas commented 1 week ago

Hi Amin,

I believe only the macrocycle was superimposed on the crystal structure macrocycle which was then used as the reference for the RMSD calculation.

Best of luck with the benchmark!

amin-sagar commented 1 week ago

Thanks @akalpokas If I align just the macrocycle, I get the following plot which seems a bit better in terms of RMSD. Phe_RMSD_espaloma_superMOL0 Still, it seems to be higher than what you observe with Amber. I will try to run the exact same protocol with the amber files and report the results.

Best, Amin.

akalpokas commented 1 week ago

Hi @amin-sagar,

The amber files uploaded in this repository are actually the earlier version of the input files that we ended up using. In those files one of the macrocycle links is not properly parametrised and the macrocycle starts to fall apart after short amount of simulation. The input files here compound_19_final.tar.gz are the ones that were properly parametrised and the ones used for the analysis. I believe the Amber protein+ligand complex also would need to be rebuilt just to be sure. Once I sort out the write access for the repository I will upload the correct input files for Amber simulations.