marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
97 stars 43 forks source link

"order" attribute in links is fragile #57

Open jbarnoud opened 6 years ago

jbarnoud commented 6 years ago

Link depending on the position of two residues in a sequence rely exclusively in the resid to figure out the position in the sequence. This will surely break with branched and cyclic molecules.

One solution would be to consider the "order" attribute a distance in the residue graph, and only use the resid to decide the direction of the sequence. It would solve the issue for branched molecules, but the sequence orientation will break a the junction in cyclic molecules. Also, this would require to cache the residue graph as it would be consulted quite a lot in the linking process.

pckroon commented 6 years ago

The directionality we might be able to capture in a directed graph (nx.DiGraph). But fixing cyclic structures is still a challenge then. We could argue something about the maximum in/out degree of residues, and take it from there. But that's also fragile.

pckroon commented 5 years ago

Together with #170 and #161 this is solved by using ISMAGS for the link isomorphisms. Meaning every link will only get applied asymmetrically. If a link is supposed to be asymmetric, it should cover more atoms until it is unambiguous.

jbarnoud commented 5 years ago

My brain has shown to be in a less than optimal state today, though I do not see how ISMAGS is relevant here. The issue is that we use the resid to deal with the "order" keyword. This means that an edge such as "BB +BB" has to rely on the residue sequence being continuous; it will break for instance with PTM being their own residue.

pckroon commented 5 years ago

That was the result of a discussion we had some time ago :) I'll try to summarize it here.

The "order" attribute exists solely to break symmetry and not apply the same link to the same atoms multiple times. ISMAGS also breaks symmetry and can thus be used to serve the same purpose. It creates the following new edge cases: 1) I want to apply the same link to the same atoms multiple times. A: Don't. How often do you want it? Instead, add multiple interactions to the same link. 2) My interaction is asymmetric and needs to be applied the right way round. A: You link is underdefined if the atoms the link specifies are symmetric, but the interactions are not. Add more atoms to the link until the graph is no longer symmetric

jbarnoud commented 5 years ago

OK I remember. Getting rid of the order would change quite a lot the way of writing links, though; we need to assess if it is worth the trouble.

pckroon commented 5 years ago

I think it would enable just removing the order attribute from links, and makes writing them much much easier. For failsafes and helping the user it might be nice if we had a list of interactions we know are asymmetric, and warn if they can get applied the wrong way round.

jbarnoud commented 5 years ago

There is a matter of user's logic to consider. Take a peptide, for instance, it is easier to think of a link between a residue and the next one, than to think about a link between 2 residues without order. In other words, it is easier to think about "BB +BB" than "BB BB". Also, "BB +BB" will be the ISMAGS default, but what if I want to have a "-BB BB" instead? It may not matter for the link itself, but it would in the way I think about it. Also, the order may matter: a "-BB BB +BB ++BB" dihedral will not be the same is its symmetric; though in the case of a tetra-alanine in Martini, I cannot define the link any more.

One other issue is how we reference nodes. How do I write the following without order?

[ link ]
; BB -- +BB
; |      |
; SC1    XX
[ edges ]
BB +BB
BB SC1
+BB +XX
pckroon commented 5 years ago

It's fine to write e.g. BB +BB, but it shouldn't matter whether that's applied "forward" or "backward". I agree that it may be annoying/confusing to users and a different file format syntax may be needed.

You are right that for CG force fields you may lose too much details to preserve the directionality in any other way than in resid. That is something we need to think long and hard about. Maybe we want to encode that into an edge attribute between two CG beads, referring to nodes in underlying "graph" attributes of those beads? Gets complicated fast.

Last point: ISMAGS can not be used with > and < orders, because bead equality has to be transitive. It -, can, in principle be used with -, +, ++ etc. But those should then only be used where necessary. They become ambiguous in case of e.g. cyclic peptides anyway.

pckroon commented 5 years ago

We had an extensive discussion, here's a summary: