marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
89 stars 40 forks source link

Could not identify modifications #424

Closed majumderS closed 1 year ago

majumderS commented 2 years ago

I have defined modifications for 3-ter and 5-ter termini of a RNA Molecule. The terminal Hydrogens are written as H3T and H5T respectively.

The definitions are done in the .ff files and mapping files. My modifications look like this:

[ modification ]
5-ter
[ atoms ]
C5' {"element": "C"}
O5' {"element": "O"}
H5' {"element": "H"}
H5'' {"element": "H"}
H5T {"element": "H"}
[ edges ]
C5' H5'
C5' H5''
C5' O5'
O5' H5T

[ modification ]
3-ter 
[ atoms ]
O3' {"element" : "O"}
H3T {"element" : "H"}
C3' {"element" : "C"}
H3' {"element" :"H"}
[ edges ]
O3' H3T
O3' C3'
C3' H3'

Yet martini is unable to read the modifications and gives the following warning:

    INFO - step - vermouth - Dealing with modifications.
 WARNING - unknown-input - vermouth.processors.canonicalize_modifications - Could not identify the modifications for residues ['C401'], involving atoms ['29-H5T']
 WARNING - unknown-input - vermouth.processors.canonicalize_modifications - Could not identify the modifications for residues ['U409'], involving atoms ['282-H3T']
    INFO - step - vermouth - Read input.

Cant make out where am I going wrong., probably in the modifications definition. Please help.

pckroon commented 2 years ago

You're super close :) You still need to tell which atoms are not in the Blocks yet. In this case, e.g., H3T {"element": "H", "PTM_atom": true}. See also https://github.com/marrink-lab/vermouth-martinize/blob/master/vermouth/data/force_fields/universal/modifications.ff

majumderS commented 2 years ago

sure that makes sense. I also forgot to add modifications.mapping for the C-ter and N-ter modifications. I did that and the error did go. But I am still not sure what purpose do [from nodes] and [from edges] directives in modifications.mapping serve ? As far as I understand, this file maps the modifications to beads. for example below for C-ter termini in amino acid.

[ modification ]
[ from ]
universal
[ to ]
martini3001

[ from blocks ]
C-ter
[ to blocks ]
C-ter

[ from nodes ]
N
HN

[ from edges ]
HN N
N CA

[ mapping ]
CA BB
C  BB
O  BB
OXT BB
pckroon commented 2 years ago

You're correct. The mapping you have above (C-ter) maps the atoms described by the modification 'C-ter' ([ from blocks ]) of the forcefield 'universal' ([ from ]) to the modification 'C-ter' ([ to blocks ]) of the forcefield 'martini3001' ([ to ]). But there's a complication, since the martini3001/C-ter modification describes/represents a larger molecular fragment than the universal/C-ter modification, since the BB bead describes the complete protein backbone.

So we introduce the [ from nodes ] section, which contains all the atoms that have to be taken into account into the mapping, but are not described by the origin/from modification. On top of that, we work with graphs exclusively, so we need the edges as well, hence the [ from edges ] section.

HTH!

majumderS commented 2 years ago

Thanks that helps.

majumderS commented 2 years ago

Hi, sorry opening this issue again. But I have noticed a strange behavior in the code with the repair graph option. It seems to add missing atoms even if there is no missing atoms in the residue because the residue is a termini. I understand this happened because martinize reads the residue and matches with block atoms. If atoms are not present it will add. But how do we distinguish between a normal residue [as defined in the blocks] vs a modified residue [defined in the modifications]. For example, I am currently working with nucleobases. So the 5-ter termini has no phosphate group but a H5T attached to the C5' end. The residue block doesn't have H5T defined in it but rather this definition is in the modifications. Now everytime I run martinize it repairs the residue and adds a phosphate group as shown below:

martinize2 -v -f 1si3.hadded.pdb -ff martini3001 -from universal -x CG.pdb -o topol.top
    INFO - general - vermouth.pdb.pdb - Read 1 molecules from PDB file 1si3.hadded.pdb
    INFO - step - vermouth - Guessing the bonds.
   DEBUG - general - vermouth.processors.make_bonds - Guessed bond between 1B-C401:O5' and 29B-C401:H5T based on distance.
    INFO - general - vermouth.processors.make_bonds - 1 molecules after guessing bonds
   DEBUG - step - vermouth - Annotating required mutations and modifications.
    INFO - step - vermouth - Repairing the graph.
   DEBUG - step - vermouth.processors.repair_graph - Making reference graph
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-C401 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-G402 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-U403 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-G404 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-A405 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-C406 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-U407 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-C408 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-U409 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue C401
    INFO - missing-atom - vermouth.processors.repair_graph - Missing atom C401:P
    INFO - missing-atom - vermouth.processors.repair_graph - Missing atom C401:O1P
    INFO - missing-atom - vermouth.processors.repair_graph - Missing atom C401:O2P
   DEBUG - missing-atom - vermouth.processors.repair_graph - Adding 284B-C401:P
   DEBUG - missing-atom - vermouth.processors.repair_graph - Adding 285B-C401:O2P
   DEBUG - missing-atom - vermouth.processors.repair_graph - Adding 286B-C401:O1P
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue G402
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue U403
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue G404
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue A405
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue C406
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue U407
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue C408
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue U409
    INFO - step - vermouth - Dealing with modifications.
    INFO - general - vermouth.processors.canonicalize_modifications - Identified the modifications ['5-ter'] on residues ['C401']
    INFO - general - vermouth.processors.canonicalize_modifications - Identified the modifications ['3-ter'] on residues ['U409']

I am not sure if I am able to make myself clear here, but my basic question is how do we distinguish between residues that are modified vs 'regular' residues and why is repair-graph repairing the modified residue with the atoms present in the 'regular' residue block?

pckroon commented 2 years ago

But how do we distinguish between a normal residue [as defined in the blocks] vs a modified residue [defined in the modifications].

You don't (-: The intended solution is to have the modification describe the atoms that should be removed again. You can do this by adding, e.g. O2P {"replace": {"atomname": none}} to the modification. Replacing the atomname to 'none' indicates the node should be removed.

majumderS commented 2 years ago

Ok tried this O2P {"replace": {"atomname": none}} in modifications.ff under universal section. So first the repair residue module do replace the missing atoms but then due to the modifications those atoms are not considered and thus the bead coordinates (the bead corresponding to the removed atoms) are written as 'nan' in the final pdb file. That can serve the purpose, I can simply remove the bead with nan coordinate at the end, as that particular bead shouldn't exist whenever the modified residue is present.

pckroon commented 2 years ago

but then due to the modifications those atoms are not considered and thus the bead coordinates (the bead corresponding to the removed atoms) are written as 'nan' in the final pdb file.

Hmmn. Replacing the atomname to none /should/ remove those atoms again. Do you have the screenoutput of Martinize2 for such a run? I'll dive through the code in the meantime.

majumderS commented 2 years ago

Sorry for late reply. Busy week it is! Thanks for all the help :) .

Here is what I wrote in universal/modifications.ff to facilitate removal of phosphate group at 5-ter end as you suggested.

[ modification ]
5-ter
[ atoms ]
C5' {"element": "C"}
O5' {"element": "O"}
H5' {"element": "H"}
H5'' {"element": "H"}
H5T {"element": "H","PTM_atom":true,"replace":{"atomname":"H5T"}}
P {"replace":{"atomname":"none"}}
O1P {"replace":{"atomname":"none"}}
O2P {"replace":{"atomname":"none"}}

Below is a screenshot of the output. Apparently it cannot identify the 5-ter modification, and hence it is giving a nan for BB1 coordinates where the phosphate group is mapped in a regular residue (non-terminal residue). image

pckroon commented 2 years ago

Is that your complete modification? If so, you're missing the edges. And try: O2P {"replace":{"atomname":none}}, so without the quotes around none

majumderS commented 2 years ago

I missed putting the edges here in the comment. I tried removing the quotes from none. I get an error: image

pckroon commented 2 years ago

Apologies. It should be null (still no quotes), not none.

majumderS commented 2 years ago

yeah that worked. but the modification is not yet identified for 5-ter. image May be I am defining the modifications wrong. so my complete modifications are: cg_modification:

[ modification ]
5-ter
[ atoms ]
BB2 {"replace":{"atype": "SN3a", "charge":0}}

universal/modification:

[ modification ]
5-ter
[ atoms ]
C5' {"element": "C"}
O5' {"element": "O"}
H5' {"element": "H"}
H5'' {"element": "H"}
H5T {"element": "H","PTM_atom":true,"replace":{"atomname":"H5T"}}
P {"replace":{"atomname":null}}
O1P {"replace":{"atomname":null}}
O2P {"replace":{"atomname":null}}

[ edges ]
C5' H5'
C5' H5''
C5' O5'
O5' H5T

the modification.mapping:

[ modification ]
[ from ]
universal
[ to ]
martini3001

[ from blocks ]
5-ter
[ to blocks ]
5-ter

[ from nodes ]
C4'
H4'
O4'

[ from edges ]
C4' H4'
C4' O4'

[ mapping ]
C5'  BB2
O5'  BB2
H5'  BB2
H5'' BB2
H5T  BB2
pckroon commented 2 years ago

You're missing the edges between O1P, O2P and P (and P and O5?) in the atomistic modification. Make sure your modification is fully connected, because your input molecule is also fully connected. And edges are important/critical/used for matching.

majumderS commented 2 years ago

ok that worked . NO error in output i.e. identified modifications image

I missed this because I thought the modification edges should reflect the modified residue and hence should not have the edges between O1P, O2P , P , O5'. But I guess the point of defining the modification edges as it is, and then the "atomname" = null option removes the atoms. Thanks!

However the coordinate for BB1 which earlier had the P, O1P, O2P is nan. ATOM 1 BB1 C B 401 nan nan nan 1.00 0.00
It would be nice if the bead is removed altogether.

pckroon commented 2 years ago

If the BB1 bead should be removed your martini3001 modification should specify that --- by setting its atomname to null as well.

majumderS commented 2 years ago

I tried : BB1 {"replace":{"atype":null}} and BB1 {"replace":{"atomtype":null}} in martini3001 modificattions as well as BB1 {"replace":{"atomname":null}}

all gives keyerror. image

pckroon commented 2 years ago

Could you copy/paste (not screenshot) th efull output of martinize2 -v? I'll see if I can reproduce this locally

majumderS commented 2 years ago

ok. Traceback (most recent call last): File "/home/subhasree/.local/bin/martinize2", line 802, in entry() File "/home/subhasree/.local/bin/martinize2", line 685, in entry delete_unknown=True, File "/home/subhasree/.local/bin/martinize2", line 154, in martinize attribute_must=('resname')).run_system(system) File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 766, in run_system new_molecule = self.run_molecule(molecule) File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 759, in run_molecule attribute_must=self.attribute_must File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 596, in do_mapping mol_to_out, out_to_mol) File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 432, in apply_mod_mapping mol_idxs = mod_to_mol[mod_idx] KeyError: 1

pckroon commented 2 years ago

No, I need the /complete/ output. Not just the stacktrace

majumderS commented 2 years ago

Sure.

martinize2 -v -f 1si3.hadded.pdb -ff martini3001 -from universal -x CG.pdb -o topol.top 
    INFO - general - vermouth.pdb.pdb - Read 1 molecules from PDB file 1si3.hadded.pdb
    INFO - step - vermouth - Guessing the bonds.
   DEBUG - general - vermouth.processors.make_bonds - Guessed bond between 1B-C401:O5' and 29B-C401:H5T based on distance.
    INFO - general - vermouth.processors.make_bonds - 1 molecules after guessing bonds
   DEBUG - step - vermouth - Annotating required mutations and modifications.
    INFO - step - vermouth - Repairing the graph.
   DEBUG - step - vermouth.processors.repair_graph - Making reference graph
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-C401 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-G402 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-U403 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-G404 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-A405 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-C406 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-U407 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-C408 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Matching residue B-U409 to its reference
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue C401
    INFO - missing-atom - vermouth.processors.repair_graph - Missing atom C401:P
    INFO - missing-atom - vermouth.processors.repair_graph - Missing atom C401:O1P
    INFO - missing-atom - vermouth.processors.repair_graph - Missing atom C401:O2P
   DEBUG - missing-atom - vermouth.processors.repair_graph - Adding 284B-C401:P
   DEBUG - missing-atom - vermouth.processors.repair_graph - Adding 285B-C401:O2P
   DEBUG - missing-atom - vermouth.processors.repair_graph - Adding 286B-C401:O1P
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue G402
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue U403
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue G404
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue A405
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue C406
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue U407
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue C408
   DEBUG - step - vermouth.processors.repair_graph - Repairing residue U409
    INFO - step - vermouth - Dealing with modifications.
    INFO - general - vermouth.processors.canonicalize_modifications - Identified the modifications ['5-ter'] on residues ['C401']
   DEBUG - change-atom - vermouth.processors.canonicalize_modifications - Changing attribute atomname from P to None for atom 284B-C401:P
   DEBUG - change-atom - vermouth.processors.canonicalize_modifications - Changing attribute atomname from O1P to None for atom 286B-C401:O1P
   DEBUG - change-atom - vermouth.processors.canonicalize_modifications - Changing attribute atomname from O2P to None for atom 285B-C401:O2P
    INFO - general - vermouth.processors.canonicalize_modifications - Identified the modifications ['3-ter'] on residues ['U409']
    INFO - step - vermouth - Read input.
   DEBUG - step - vermouth - Read molecule molecule with 286 atoms, and 306 edges.
    INFO - step - vermouth - Creating the graph at the target resolution.
    INFO - general - vermouth.processors.do_mapping - Applying modification mapping ('5-ter',)
Traceback (most recent call last):
  File "/home/subhasree/.local/bin/martinize2", line 802, in <module>
    entry()
  File "/home/subhasree/.local/bin/martinize2", line 685, in entry
    delete_unknown=True,
  File "/home/subhasree/.local/bin/martinize2", line 154, in martinize
    attribute_must=('resname')).run_system(system)
  File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 766, in run_system
    new_molecule = self.run_molecule(molecule)
  File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 759, in run_molecule
    attribute_must=self.attribute_must
  File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 596, in do_mapping
    mol_to_out, out_to_mol)
  File "/home/subhasree/.local/lib/python3.6/site-packages/vermouth/processors/do_mapping.py", line 432, in apply_mod_mapping
    mol_idxs = mod_to_mol[mod_idx]
KeyError: 1
pckroon commented 2 years ago

Thanks! The good news is that I can reproduce this error message. The bad news is that you found a bug. The worse news is that I can't find an easy workaround. The worst news is that this bug might prove hard to solve since it's entangled in quite a few things.

I'll do some digging and open a separate issue to track this specifically (hopefully later today, but definitely by the end of the week). I'll ping you (and this issue) there. With a bit of luck my digging will also outline a solution...

majumderS commented 2 years ago

ok sure. Thanks.

fgrunewald commented 2 years ago

the bug has been solved; so now it should work. @majumderS can you try again?

fgrunewald commented 1 year ago

closing the issue due to inactivity and that it was presumably fix