Original version by Emanuele Scalone, Cristina Paissoni, and Carlo Camilloni, Computational Structural Biology Lab, Department of Biosciences, University of Milano, Italy.
Multi-eGO force fields and tools are intended to be used with GROMACS, currently suggested versions are 2023 and 2024. You will need to know how to compile GROMACS from source, as some multi-eGO tools require GROMACS to be recompiled.
Use conda
and the environment file provided as
conda env create -f conda/environment.yml
conda activate meGO
It is also possible to use pip install -r requirements.txt
.
To install the cmdata
see here
The first step in running a multi-eGO simulation is to create a GROMACS topology file (.top)
.
Copy your PDB file and the multi-ego-basic.ff/
included here into a folder, then run
gmx pdb2gmx -f file.pdb -ignh
and select the multi-ego-basic forcefield. This should give you a (.gro)
file for your structure and a (.top)
topology file. In the multi-eGO/inputs
folder, add a folder for your system and a reference/
subfolder. Copy your GROMACS topology into this reference/
subfolder so that the final structure looks like this:
└── input
└── system_name
└── reference
├── topol.top
└── multi-eGO_basic.ff
Assuming that a training simulation has already been run, two steps are required to learn the interactions from that simulation. First, one need to extract the contact data from the simulation. To do this you can use the cmdata
tool. The tool has to be installed by recompiling GROMACS, see Installation.
cmdata -f $YOUR_TRAJECTORY.xtc -s $YOUR_TOPOLOGY.tpr
cmdata
reads a trajectory and a GROMACS run input file. The output will be a collection of histograms in the form of .dat
text files. These files then need to be processed to obtain contact distances and probabilities. To do this one can use tools/make_mat/make_mat.py
as follows, assuming that the histograms are located in the md simulation directory in a subdirectory called histo/
:
python tools/make_mat/make_mat.py --histo $MD_DIRECTORY/histo --target_top $MD_DIRECTORY/topol.top --mego_top inputs/$SYSTEM_NAME/reference/topol.top --out inputs/$SYSTEM_NAME/md_ensemble
Finally, you need to copy the topology, force field and contact files into an appropriate folder, such as
└── input
└── system_name
├── reference
│ ├── topol.top
│ └── multi-eGO_basic.ff
└── md_ensemble
├── topol.top
├── intramat_1_1.ndx
└── all-atom.ff
Create a folder in which you want to run the random coil simulation. Copy the multi-ego-basic.ff/
folder and the .gro
file generated in the first step into this folder. To generate a random coil force field and associated topology run
python multiego.py --system $SYSTEM_NAME --egos rc
multiego.py
will then create an output directory in multi-eGO/outputs/${SYSTEM_NAME}_rc
which provides the inputs for the random coil simulation.
The contents of the output folder are ffnonbonded.itp
and topol_mego.top
. The former is the non-bonded interaction file and needs to be copied into the multi-ego-basic.ff/
folder. The latter needs to be placed in the simulation root directory. We provide mdps
simulation setup files tested with various multi-eGO setups in the multi-eGO/mdps
folder. The order in which the simulations are run is as follows:
1. ff_em.mdp
2. ff_cg.mdp
3. ff_aa-posre.mdp
4. ff_rc.mdp
Once the random coil simulation is done, you need to analyse it using cmdata
and make_mat.py
as before:
cmdata -f $YOUR_TRAJECTORY.xtc -s $YOUR_TOPOLOGY.tpr
python tools/make_mat/make_mat.py --histo $RC_DIRECTORY/histo --target_top $RC_DIRECTORY/topol.top --mego_top inputs/$SYSTEM_NAME/reference/topol.top --out inputs/$SYSTEM_NAME/reference
This is the final structure of the input folders:
└── input
└── system_name
├── reference
│ ├── topol.top
│ ├── intramat_1_1.ndx
│ └── multi-eGO_basic.ff
└── md_ensemble
├── topol.top
├── intramat_1_1.ndx
└── all-atom.ff
To setup a multi-eGO production simulation, you need to run multiego.py
again. Before running the code, make sure that the topologies of your systems all have the same moleculetype name. If they do not, you need to change the name in the topol.top
file or the program will crash.
python multiego.py --system $SYSTEM_NAME --egos production --epsilon 0.3 --train md_ensemble
Here one sets the energy scale ε to 0.3 kJ/mol and trains the model from the md_ensemble
data. The output directory will be multi-eGO/outputs/${SYSTEM_NAME}_production_e0.3_0.3
and will contain the inputs for the production simulation. Again, the contents of the output directory are ffnonbonded.itp
and topol_mego.top
and need to be copied to the multi-ego-basic.ff/
folder and the simulation root directory. The mdps
files are the same except for the last step which is now ff_aa.mdp
.
Happy simulating :)