CG modification mappings cannot remove beads

This is a follow-up from #424, opened by @majumderS.

Modifications can remove atoms/beads from molecules by setting their atomname to None. However, if you have a modification that aims to remove a bead the mapping of that modification breaks. There's a realistic example in #424, here I'll continue with a contrived modification that aims to remove the SC1 bead of alanine. Let's set up the data files first:

; data/force_fields/universal/modifications.ff
[ modification ]
ALA-derp
[ atoms ]
CB {"resname": "ALA", "element": "C", "replace": {"atomname": null}}
HB1 {"resname": "ALA", "element": "H", "replace": {"atomname": null}}
HB2 {"resname": "ALA", "element": "H", "replace": {"atomname": null}}
HB3 {"resname": "ALA", "element": "H", "replace": {"atomname": null}}
[ edges ]
CB HB1
CB HB2
CB HB3

; data/force_fields/martini3001/modifications.ff
[ modification ]
ALA-derp
[ atoms ]
SC1 {"resname": "ALA", "replace": {"atomname": null}}

; data/mappings/martini30/martini3001/modifications.mapping
[ modification ]
[ from ]
universal
[ to ]
martini3001
[ from blocks ]
ALA-derp
[ to blocks ]
ALA-derp
[ mapping ]
CB SC1
HB1 SC1
HB2 SC1
HB3 SC1

I'm using the following invocation for testing: martinize2 -f vermouth/tests/data/1bta.pdb -x derp.pdb -modify ALA3:ALA-derp.

Right off the bat this errors, since whoever wrote the mapping parser (vermouth/map_parser.py:291) decided that it's a good idea to immediately apply any "replace" directives on nodes. (I wonder who had that idea...). This is an issue, since this means all nodes in the atomistic ALA-derp modification will have "atomname": None. Which means you can't use them to construct mappings any more since atom names will no longer be unique. So let's comment that out, and see what happens. Now martinize2 will actually run, and produce the following lines in its output:

    INFO - general - Applying modification ALA-derp to residue A-ALA3
    INFO - general - Identified the modifications ['ALA-derp'] on residues ['ALA3']

So far, so good. It also finds the ("ALA-derp",) mapping, which is also good. However, it doesn't apply it, and ALA3 still has an SC1 bead. So, we turn to vermouth/processors/do_mapping.py :'( do_mapping calls modification_matches to figure out which modification mappings it needs to apply. This ultimately finds two modification mappings that need to be applied (N-ter and C-ter). modification_matches does recognize our ALA3 as being modified, and ALA-derp as needed mod mapping (line 279). It then starts doing subgraph isomorphisms with all the needed mod mappings, and there it finds that the mod mapping ('ALA-derp',) does not fit the modified residue. Ultimately, it is map_parser.Mapping._graph_map that does the actual isomorphism and finds no matches.

Time to investigate the block_from for the mod mapping, and the node equality criterion. Nodes:

(0, {'PTM_atom': False, 'resname': 'ALA', 'element': 'C', 'atomname': 'CB', 'modifications': [<Modification "ALA-derp" at 0x7f7274bb11c0>], 'resid': 1, 'charge_group': 1}),
(1, {'PTM_atom': False, 'resname': 'ALA', 'element': 'C', 'atomname': 'HB1', 'modifications': [<Modification "ALA-derp" at 0x7f7274bb11c0>], 'resid': 1, 'charge_group': 1}),
(2, {'PTM_atom': False, 'resname': 'ALA', 'element': 'C', 'atomname': 'HB2', 'modifications': [<Modification "ALA-derp" at 0x7f7274bb11c0>], 'resid': 1, 'charge_group': 1}),
(3, {'PTM_atom': False, 'resname': 'ALA', 'element': 'C', 'atomname': 'HB3', 'modifications': [<Modification "ALA-derp" at 0x7f7274bb11c0>], 'resid': 1, 'charge_group': 1})

And do_mapping.ptm_resname_match is used for node equality, which in turn also calls do_mapping.node_matcher after accounting for the resname, PTM_atom, and modifications attributes. node_matcher compares all node attributes, except atype, charge, charge_group, mass, resid, replace and _old_atomname. Digging a bit deeper, it tells me that the following molecule node: {'atomid': 51, 'atomname': None, 'altloc': '', 'resname': 'ALA', 'chain': 'A', 'resid': 3, 'insertion_code': '', 'occupancy': 1.0, 'temp_factor': 0.34, 'element': 'C', 'charge': -0.27, 'position': array([-0.1491, 0.2834, 0.5486]), 'mol_idx': 0, '_res_serial': 2, 'modification': ['ALA-derp'], 'atype': 'CT3', 'charge_group': 4, 'modifications': [<Modification "ALA-derp" at 0x7f73b327fb80>], '_old_atomname': 'CB', 'mass': 12} is not equal to mapping node {'resname': 'ALA', 'element': 'C', 'atomname': 'CB', 'resid': 1, 'charge_group': 1}. If I strip out all attributes that node_matcher ignores anyway we're left with: mol_node: {'atomname': None, 'resname': 'ALA', 'element': 'C'} map_node: {'resname': 'ALA', 'element': 'C', 'atomname': 'CB'} And here I have to agree with martinize2: None != 'CB'. Note however, that the complete mol_node has an _old_atomname we should exploit. We already do this for block mappings, so it's relatively straightforward: in ptm_resname_match swap out the call to node_matcher to call _old_atomname_match instead.

PR open soon!

marrink-lab / vermouth-martinize

CG modification mappings cannot remove beads #439