marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
95 stars 43 forks source link

Histidine mapping with "elnedyn22p" force-field #389

Closed biomolsim closed 2 years ago

biomolsim commented 2 years ago

Hello everyone,

Is it possible to define charged and neutral histidine using elnedyn22p force-field? While I get the error below for HSP:

WARNING - unmapped-atom - These atoms are not covered by a mapping. Either your mappings don't describe all atoms (bad idea), or, there's no mapping available for all residues.

Replacing HSP with HISH results in the following error:

WARNING - unknown-residue - Can't add bonds based on atom names for residue X-HISH because 'Residue HISH is not known to force field universal'. Falling back to distance criteria.

Please let me know which histidine names are acceptable for this force-field. Thank you very much for reading!

pckroon commented 2 years ago

elnedyn22p defines HIS and HISH residues at the moment (https://github.com/marrink-lab/vermouth-martinize/blob/master/vermouth/data/force_fields/elnedyn22p/aminoacids.ff#L444) There is however, only a mapping for HIS->HIS (https://github.com/marrink-lab/vermouth-martinize/blob/master/vermouth/data/mappings/elnedyn/his.charmm36.map) Universal defines HIS, HSP, HSE, and HSD (https://github.com/marrink-lab/vermouth-martinize/blob/master/vermouth/data/force_fields/universal/aminoacids.rtp)

To be honest, Histidines are a bit of a mess at the moment. Ideally, I'd define only HIS, and all other forms (HSD, HSE, HSP) as modifications rather than residues. This would also require an automagic mapping of less-standard histidine names (basically anything that's not HIS) to become HIS+. And it would require quite a lot of work on the data files :( It would get more complicated since not all forcefields (explicitly) define/distinguish all HIS tautomers, IIRC...

For now though, you could define a HSP->HSP mapping for elnedyn22p, and rename the HISH residue there to HSP (mind the protein_resnames variable/macro at the top of the .ff files). PR welcome :)

biomolsim commented 2 years ago

Thank you very much @pckroon for the fast response. However, I could not understand the procedure to be followed, unfortunately.

I found out about Vermouth just today and had previously used the earlier version of Martinize. For my project, I have to include both the HIS (say HSE as per CHARMM36 topology) and HISH (HSP) residues in the output PDB file. I was able to model the HSE residues by changing their name to HIS in the input PDB file. However, I am not sure how to address the protonated histidine (HISH/HSP) residues in the input PDB file.

Therefore, I have the following doubts regarding the HISH/HSP inclusion with the elnedyn22p force field:

  1. What should be the name of the HISH/HSP residues in the input fully-atomistic PDB file?
  2. What modifications or mutations should be passed to martinize2?
  3. Furthermore, would it be feasible for you to kindly provide a short example of the command to be supplied to martinize2?

I sincerely thank you for your generous assistance and hope I was able to explain my issue. Please let me know if any further information is required from my side.

pckroon commented 2 years ago

What should be the name of the HISH/HSP residues in the input fully-atomistic PDB file?

HSP is probably best, but it won't work out of the box. The universal FF knows HSP, so that bit will work. Elnedyn22p only has a mapping for HIS -> HIS, so you'll need to add a mapping HSP -> HSP (or HSP->HISH, but that's more complicated). Lastly, Elnedyn22p knows blocks/residues HIS and HISH. Assuming you'll work with the mapping HSP->HSP, you need to add a block HSP, either by renaming HISH, or adding something new. The files for this will live somewhere on your computer (see pip show vermouth for the folder, probably /usr/local/lib/python3/site-packages/vermouth or something along those lines. At least on Linux). In that vermouth folder, there is a data folder containing the force fields and mappings you'll need to adapt. If you manage I'd very much welcome a pull request with your improvements :)

What modifications or mutations should be passed to martinize2?

You can use the -mutate flag to change specific residues to other residues, for example from HIS to HSP or the other way around. But it should not be needed otherwise.

What modifications or mutations should be passed to martinize2?

From memory (can't test at the moment): martinize2 -f protein.pdb -x protein-cg.pdb -o topol.top -ff elnedyn22p should do the trick. See also martinize2 -h.

lenn24 commented 2 years ago

Hello @pckroon and others, I have a similar issue trying to use ASPP (and GLUP) from the universal forcefield and am failing to correctly map this to martini3001.

I made the following modifications (based largely on what's inside the elnedyn22 files), but still get "unmapped-atom" WARNINGs.

vermouth/data/force_fields/universal/modifications.ff:

[ modification ]
ASPP
[ atoms ]
HD2 {"resname": "ASP", "element": "H", "PTM_atom": true}
OD2 {"resname": "ASP"}
[ edges ]
OD2 HD2

vermouth/data/force_fields/martini3001/modifications.ff:

[ modification ]
ASPP
[ atoms ]
SC1 {"resname": "ASP", "replace": {"atype": "P3", "charge": 0}}

vermouth/data/mappings/martini30/martini3001/modifications.mapping:

[ modification ]
[ from ]
universal
[ to ]
martini3001
[ from blocks ]
ASPP
[ to blocks ]
ASPP
[ from nodes ]
CB
CG
OD1
[ from edges ]
CB CG
CG OD1
CG OD2
[ mapping ]
CB SC1
CG SC1
OD1 SC1
OD2 SC1
HD2 SC1

Am I taking the right approach or completely off?

Also, where should I refer to for which bead to replace with in modifications.ff? Martini_v2.2_aminoacids.itp uses P3, but is this still appropriate for Martini3?

Any guidance would be most appreciated. Thank you.

pckroon commented 2 years ago

Hello hello,

Your modifications and mapping look correct to me! Does m2 print that it identifies the appropriate modifications on the atomistic structure? Does it print it's using the appropriate modification mapping?

Also, where should I refer to for which bead to replace with in modifications.ff? Martini_v2.2_aminoacids.itp uses P3, but is this still appropriate for Martini3?

Excellent question. I'm not sure this is already established. @paulocts?

lenn24 commented 2 years ago

Thank you so much for your reply, @pckroon.

Does m2 print that it identifies the appropriate modifications on the atomistic structure? Does it print it's using the appropriate modification mapping?

No, it doesn't. It applies the N-ter and C-ter modifications when repairing the graph, and identifies these when dealing with modifications. When it reads the input, however, it applies the N-ter and C-ter modification mappings but then prints unmapped-atom WARNINGs for ASPP and GLUP atoms (please see below for an excerpt of the print).

    INFO - step - Read input.
    INFO - step - Creating the graph at the target resolution.
    INFO - general - Applying modification mapping ('N-ter',)
    INFO - general - Applying modification mapping ('C-ter',)
 WARNING - unmapped-atom - These atoms are not covered by a mapping. Either your mappings don't describe all atoms (bad idea), or, there's no mapping available for all residues. ['1042A-GLUP72:N', '1044A-GLUP72:CA', '1046A-GLUP72:CB', '1049A-GLUP72:CG', '1052A-GLUP72:CD', '1053A-GLUP72:OE1', '1054A-GLUP72:OE2', '1056A-GLUP72:C', '1057A-GLUP72:O', '66A-GLUP5:N', '68A-GLUP5:CA', '70A-GLUP5:CB', '73A-GLUP5:CG', '76A-GLUP5:CD', '77A-GLUP5:OE1', '78A-GLUP5:OE2', '80A-GLUP5:C', '81A-GLUP5:O', '598A-ASPP40:N', '600A-ASPP40:CA', '602A-ASPP40:CB', '605A-ASPP40:CG', '606A-ASPP40:OD1', '607A-ASPP40:OD2', '609A-ASPP40:C', '610A-ASPP40:O', '1264A-GLUP85:N', '1266A-GLUP85:CA', '1268A-GLUP85:CB',

Please let me know if including the entire print or GLUP modifications could be helpful. For the latter, I used a P1 replacement (again based on martini_v2.2_aminoacids.itp). Thanks again.

pckroon commented 2 years ago

If it doesn't identify the modifications, then it won't map them later on. Taking your ASPP modification as example, that will be identified if there's a HD2 atom connected to an OD2 atom (and /only/ an OD2 atom...) That means if that hydrogen is not in your input pdb, it won't find the modification. Alternatively, you can specify where it should be applied with the -modify flag (double check the termini if you go down this route).

lenn24 commented 2 years ago

Hello @pckroon,

Taking your ASPP modification as example, that will be identified if there's a HD2 atom connected to an OD2 atom (and /only/ an OD2 atom...) That means if that hydrogen is not in your input pdb, it won't find the modification.

The PDB files do have the HD2 atoms. Perhaps they were leading to unidentified modifications because they were not only connected to OD2 atoms?

Alternatively, you can specify where it should be applied with the -modify flag (double check the termini if you go down this route).

I did end up using the -modify flag for each ASPP/GLUP residue, and it seemed to work. The print did not report any warnings or errors, and the resulting itp file has the sQ5n/Q5n --> P3/P1 modifications. Thank you so much for your help - I appreciate it immensely!

Tsjerk commented 2 years ago

@pckroon Can this issue be closed or reopend as more to the point?

biomolsim commented 2 years ago

@pckroon I have posted an additional file for the HSP -> HSP mapping with the elnedyn22p force-field. I created it as per your suggestions, and the output looks fine to me. Please have a look as feasible.

biomolsim commented 2 years ago

Hey @lenn24, did these modification mappings mentioned by you work? Please let me know the command you passed to martinize2 to implement the aspartate patch. I am considering adapting them for elnedyn22p too. Thank you.

lenn24 commented 2 years ago

Hello @biomolsim, yes they worked per @pckroon's suggestion above re: -modify flag.

For modifying Asp6 on Chain A, for example, I ran:

martinize2 -f protein.pdb -o protein_only.top -x protein_cg.pdb -dssp /path/to/dssp -p backbone -ff martini3001 -scfix -cys auto -modify A-ASP6:ASPP