marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
89 stars 40 forks source link

modification identification too few anchors #432

Closed fgrunewald closed 2 years ago

fgrunewald commented 2 years ago

Modifications that are "too large" are never identified correctly (unless there's no work to be done?). Too large in this context means more than one layer of anchors.

Workaround for now, probably stick to 1 anchor layer or take entire residue.

pckroon commented 2 years ago

Initial code trace summary so far: 1) annotating modifications results in a 'modification' (singular) attribute on all nodes in the relevant residue. 2) repair graph takes this 'modification' attribute and patches the block, repairs it accordingly. It also sets a 'modifications' (plural, I hate myself) attribute on the nodes defined by the modification, including anchors. 3) Canonicalize modifications starts by taking all nodes that have 'PTM_atom'=True (not identified by blocks in repair graph), or that have a non-empty 'modifications' (plural) attribute. These is expanded (across residue boundaries if needed) until all anchors are found. All nodes identified this way are to be covered by modifications in the FF. 4) The PTM atoms are then grouped by the resids of the anchors. 5) For all these resids, collect the corresponding atoms from the molecule. note: this doesn't use the residue_graph, instead it looks at just resid number... 6) These residues are used as basis on which all PTMs from the FF are matched, where atomnames must match for atoms where PTM_atom = False, and elements must match for atoms where PTM_atom = True. 7) Then, all these matches are used to solve a set covering: which minimal set of matching PTMs describes all PTM atoms identified in step 3.

I'm now thinking a solution would be to modify step 7, so that it only aims to cover atoms for which PTM_atom = True, so don't include the anchors in the covering goal. I'm not quite convinced this would fix the issue though. Do you have the files for me to play around with @fgrunewald?

fgrunewald commented 2 years ago

@pckroon PDB code is 1mj5 and the files are now part of the draft branch. So running the PDB structure together with the modify flag on residue 277 using HIS_PROT should be the same test-case we discussed

pckroon commented 2 years ago

Running on 1bta: martinize2 -f vermouth/tests/data/1bta.pdb -from amber -x derp.pdb -o derp.pdb works martinize2 -f vermouth/tests/data/1bta.pdb -from amber -x derp.pdb -o derp.pdb -modify HIS17:HIS_PROT breaks. Bug hunting season has been opened.

pckroon commented 2 years ago

Disconnected groups of PTM atoms (as detected by step 3) are treated separately. identify_ptms loops over these connected groups of PTM atoms, and has special logic for atoms annotated with modifications. This explains the difference in behaviour when you use the -modify flag. It then tries to do a graph isomorphism (not subgraph!) on the subgraph spanned by the group of connected PTM atoms, and the specified modification, such that atomnames must match (they should be corrected by repair graph already). This will not work in the current scenario, since the group of PTM atoms is smaller than the modification, since the modification is not connected.

I don't see an easy way out, other than starting to enforce the assumption that modifications are connected. In the actual example, if you add CE1 with edges to NE2 and ND1 to the modification it works. @fgrunewald I suggest adding a check to the modification parser to ensure all modifications are fully connected. I don't think this would prevent any use cases. Opinion? If agreed I'll open the PR.