marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
86 stars 38 forks source link

Fix/links dict #443

Open fgrunewald opened 2 years ago

fgrunewald commented 2 years ago

This PR implements a temporary interactions dict instead of using the replace_interaction function in DoLinks. The benefit is that do-links does not iterate over the complete interactions list every time an interaction is added. In addition it now automatically takes care of symmetric interactions. I also refactored the code a little bit to make it a little more readable and added doc-strings.

The cysteine problem is fixed in this PR as well. In the end the most simple fix is to allow the oder match to be true whenever a '*' wildcard is involved irrespective of the reisds. That might seem counterintuitive, however, the graph matching basically makes sure that we never find a node connected to itself, which is the only thing this check checks. I've tested in on the relent PDB as well and it produces the correct number of disulphide bridges. I see how this could be considered dangerous. In that case I think it would be best to use the mol_idx for making sure the nodes are different rather than the chain ID, which is really only used for proteins.

fgrunewald commented 2 years ago

Based on SARS-COV2 Spike protein it gives about 40% speedup

fgrunewald commented 2 years ago

@pckroon How about making interactions a custom dict that essentially emulates defaultdict(list) in interface except when __getitem__ etc. methods get an additional keyword? In general making interactions dict custom would also be a way to implement the custom type based atom-key selection. For example, for angles (A, B, C) would return the same value as (C, B, A) and something like (C, B, A) in interactions_dict would evaluate to true as well as using (A, B, C). However, for other types like VS or exclusions a different evaluator would be used. It's kinda complicated but could make the API quite simple to use and robust. Of course it kinda limits interactions again to GROMACS style, but we could have a class creator method that potentially sets types with special key considerations??

pckroon commented 2 years ago

would you be open to doing this in this PR?

Sure, but don't underestimate how much work it's going to be.

I think the cleanest way is not to mess with dict methods, but rather to make an actual Interaction(tuple) (and SymmetricInteraction(Interaction)) class, to be subclassed as needed. Then give those classes the appropriate __eq__ and __hash__ methods. Different interactions can have more intelligent methods, think of dihedrals with versions, or vsites. For the hash, I think the best way would be e.g.

def __hash__(self):
    return hash((self.atoms, reversed(self.atoms))

This means that in that dict I can ask for Bond(1, 2) or Bond(2, 1) and get back the same item. Example in code:

current_bond = mol.interactions['bonds'][Bond(1, 2)]

Thinking about this I don't really like this yet. Currently you do mol.interactions['bonds'][(1, 2)], and then get back an Interaction object. As I propose you'd use an Interaction object as key, which begs the question what the value should be. (InteractionParameters?) Implementation-wise I do think this is cleaner and easier to maintain then subclassing (default)dict with an intelligent __getitem__. Subclassing dict results in a nicer API though. How would this dict decide which comparison to use?

What's your take?

As for maintaining compatibility with the currently existing libraries I suggest turning Molecule.interactions into a property, which simply yields from the appropriate things. For future compat with interaction types we don't know about (yet), you could either make thin subclasses of Interaction, or just use Interaction itself.

fgrunewald commented 2 years ago

@pckroon I think the problem with the organisation of interactions is the following: We don't actually know, how they will be most useful outside of the martinize2 context, which is in fact very limited. We only add or remove interactions, but don't actually do anything else with them. In lieu of having to finish the paper, I propose we stick with the local dict and implement a special comparison function that handles the edge cases for now. Not so pretty but pretty safe ;)

We have some other project related to polyply development and other vermouth based applications that actually do operations on the interactions, select and filter them. In #329 I've raised this issue before to some extend even though I wasn't sure what would be the best thing to do. I think we can move the discussion in this PR to a issue and take the time to thoroughly think this through.

Good ideas in my opinion are:

pckroon commented 2 years ago

I propose we stick with the local dict and implement a special comparison function that handles the edge cases for now.

Agreed.

If you could collect required/desired features from a polyply point of view that would be great. In the meantime I'll reopen #329 with some more ideas I had on the bike