marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
86 stars 38 forks source link

implement resid handling #444

Closed fgrunewald closed 2 years ago

fgrunewald commented 2 years ago

This PR implements a new way of resid handling, which fixes issue #421, #154, #366 and associated issues. This also fixes the cysteine disulphide bridge issue.

Instead of assuming that resids in the input make sense and have logic, we now assume they are garbage and we don't use them any further. This is the safest assumption. All resids will be running from 1 -> n on a per-molecule basis. This means if two fragments get merged their resids will be clean and run from 1-> nx2. This automatically fixes the cysteine bridge issue because as soon as the two protein fragments become one molecule due to the cysteine bridge, there will be no duplicate resids. All resids in molecules are unique. This also fixes Bart's insulin dimer issue, because merge doesn't act on half-baked resids anymore.

However, we are not cruel. We store the resid as provided in the input file in a new attribute called resid_old. If the user sets the '-resid input' flag, all resids are converted back to the ones found in the input file. If they make sense or not we don't judge. What you provide is what you get, not hidden renaming. This fixes issue #366 for example, because if there are gaps in the molecule they are preserved.

There are few more things to consider:

I should add I tested the PR on all real-life test cases that were mentioned in the issues and they all work. And the test fail due to the rubber-band include region thing.

pckroon commented 2 years ago

I like this PR, I think the attribute_stash is elegant. Still missing a few tests, but that shouldn't be too hard. The only other complaint I have is that canonicalize modifications stashes atomname to _old_atomname (see also do_mapping._old_atomname_match). I suggest using the same naming scheme for stashing. Do we also want to implement a similar mechanism for using a similar mechanism for using a stashed or current attribute? I'm also fine with leaving that for later.

pckroon commented 2 years ago

Oh, and there's a few docstrings that still need updating

fgrunewald commented 2 years ago

@pckroon glad to hear that you like it! I think it solves many problems at really low coding cost. Anyways I will make this an active PR, because it has to come before the dict stuff. I'll update doc-strings and edit tests. Could you point out what else we need to test?

Regarding, the naming scheme. I agree and can adjust it.

Do we also want to implement a similar mechanism for using a similar mechanism for using a stashed or current attribute? 

I'm not sure what you mean by this.

pckroon commented 2 years ago

Do we also want to implement a similar mechanism for using a stashed or current attribute?

Typing hard, words harder.

My point was that the function do_mapping._old_atomname_match has a bit of logic in it to use either a stashed attribute ('_old_atomname'), or a current one ('atomname'). Do we want to reuse/formalize such logic? My guess is "maybe later".

fgrunewald commented 2 years ago

@pckroon All I see is that it sets as default atomname if _old_atomname is not available. Do you suggest to use this for resid-handling in for example the EN code and other bits and modfication annotators?

Do we need to actually mess with the modification annotator or does that work on the AA graph?

pckroon commented 2 years ago

@pckroon All I see is that it sets as default atomname if _old_atomname is not available. Do you suggest to use this for resid-handling in for example the EN code and other bits and modfication annotators?

Exactly. But I guess the functionality is still too much in flux to formalize this.

Do we need to actually mess with the modification annotator or does that work on the AA graph?

That works on the AA graph, so we can leave it for now.

pckroon commented 2 years ago

I would still like a unittest for the functionality as well. I expect it shouldn't be too hard to adapt one of the mapping tests to also test this

fgrunewald commented 2 years ago

I added 1 unit test for stashing attributes, 2 integration tests which cover gaps in resids and the symmetric disulfide bridge issue.

fgrunewald commented 2 years ago

@pckroon if you tell me why 639 is not triggered I'll add a test

pckroon commented 2 years ago

That has to do with the [ reference ] section of the .mapping file format. Mostly relevant for when beads span across residue boundaries. It provides a way of specifying "take CG bead attributes from this atom". So it's fine to leave it as is I think.

LGTM!

fgrunewald commented 2 years ago

@pckroon I will merge this PR then?