marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
95 stars 43 forks source link

Fix/424 #440

Closed pckroon closed 2 years ago

pckroon commented 2 years ago

Fix #439 Note that the tests currently fail, since the behaviour of the parser regarding the replace attribute has changed. The pytest traceback is not pretty, but instead of using the "new" atomnames (after modification) we'd need to use the "old" ones. An example of this comes from one of the test forcefields (vermouth/tests/data/force_fields) pepplane. Instead of:

[ modification ]
[ from ]
universal-test
[ to ]
pepplane
[ from blocks ]
C-ter
[ to blocks ]
C-ter
[ mapping ]
C   BB
O1  BB
O2  BB
[ reference atoms ]
BB C

we'd have to write

[ modification ]
[ from ]
universal-test
[ to ]
pepplane
[ from blocks ]
C-ter
[ to blocks ]
C-ter
[ mapping ]
C   BB
O  BB
OXT  BB
[ reference atoms ]
BB C

See also: vermouth/tests/data/force_fields/universal-test/modifications.ff and vermouth/tests/data/mappings/universal-test/pepplane/modifications.mapping. @fgrunewald Is this change in behaviour acceptable/desirable? Or should it be augmented so you could use either name?

fgrunewald commented 2 years ago

@pckroon I don't quite get why there was a replace attribute in the mapping file in the first place?

pckroon commented 2 years ago

I don't quite get why there was a replace attribute in the mapping file in the first place?

Not in the mapping file, in the modification. My reasoning for using the "replaced" attributes for the creating mappings was that it would be easier to define them, since you can use the final, canonical, names.

fgrunewald commented 2 years ago

@pckroon I kinda understand. But this would mean the atom-name in modifications.mapping would be the same as in modifications.ff from the universal force-field right? I think that actually seems more intuitive than using the other. So we would have:

Universal-FF
modfifications.ff == modifications.mapping == PDB names
pckroon commented 2 years ago

I'm not completely sure what you mean. To build/construct the Mapping object you need to mention atoms by their unique atom names. However, the discussion here is because we have multiple possible sources of atom names:

  1. PDB (Terrible idea)
  2. Block (Doesn't include atoms added by modifications)
  3. Modification before replacement (Contains a mix of canonical and non-canonical atom names, such as O and OXT vs O1 and O2)
  4. Modification after replacement (May contain duplicate atom names iff atoms get removed)

Before this PR (i.e. in master) the behaviour is to use source 4 as truth, but that causes issues. So the question for the discussion is whether just using source 3 is acceptable, or whether we should do something more intelligent, such as "try 4, but if that doesn't work fall back to 3"

fgrunewald commented 2 years ago

what I try to say is that in the ideal word the name in the PDB file would be the same as the name in the modifications directive '[atoms]' and the same in the mapping file (i.e. source 3). That makes intuitively for me the most sense. So if 3 works would even be an improvement.

pckroon commented 2 years ago

We assume PDB atom names to be nonsensical (in the worst case). So we can't suddenly start assuming those make sense.

I'll adapt the tests to use source 3

fgrunewald commented 2 years ago

you miss my point but that's fine. For me 3 makes intuitively sense.

pckroon commented 2 years ago

you miss my point but that's fine. For me 3 makes intuitively sense.

Completely. Could you try wording it again? I do appreciate your take on this...

fgrunewald commented 2 years ago

Sure: I guess what I try to say is that for option three the atom names in modifications.ff is the same as in the mapping file right? Regardless of the 1-3. In addition in a canonical PDB file there should also be the same atom-name. I view the replace as something that comes after these three files agree. So for me option 3 makes even more sense than 4. Only issue is I don't quite understand in the actual flow of things modifications are applied sort of with the mapping processor, right?

Quick-side bar: modifications can add and remove interactions right? That would also happen in the mapping processor?

pckroon commented 2 years ago

Sure: I guess what I try to say is that for option three the atom names in modifications.ff is the same as in the mapping file right?

Yes.

In addition in a canonical PDB file there should also be the same atom-name.

Not quite. The canonical PDB will use the replaced atom attributes. IMO that is the canonical naming scheme. It doesn't come up often. The cases I can come up with is if a modification adds an atom, which causes existing atoms to get renumbered (NH -> NH1 + NH2 + NH3; O -> O1 + O2; ...). Option 3 makes a lot of sense to me as well, but the downside is you'll have mappings using canonical atom names for half the atoms, and close-to-but-not-quite canonical atom names for the others.

Only issue is I don't quite understand in the actual flow of things modifications are applied sort of with the mapping processor, right?

Eeeuuuh. Hard question. They're used in canonicalize_modifications (right before mapping) to make correct the atom names for previously unidentified atoms (and tag modified atoms). The mapping processor comes after, and uses modification mapping. If a modification mapping applies, it will apply the modification that mapping specifies. That's also where any change in modifications should happen.

The canonicalize_modification and do_mapping processors are annoyingly interwoven: because the modifications can change atom names, do_mapping uses the _old_atomname attribute when mapping Blocks, for example (vermouth/processors/do_mapping.py:106). We could do something similar for modification mappings, but that risks causing a lot of confusion, since it would use always use source 4, except for removed atoms, for which it would use source 3, but only if everything is still non-ambiguous.

So let's stick to source 3. We can consider changing the behaviour of canonicalize_modifications in a later PR?

pckroon commented 2 years ago

Overall looks good to me. However, the old-atomname matcher function could use a second look. Is it really a good strategy to copy the complete node-dict and feed that to node-matcher? If yes then that's ok for me.

Does it hurt though? Either way, separate PR. I didn't change it for this one.

pckroon commented 2 years ago

Also all C/N-ter files in martini need updating now or am I mistaken?

Don't think so. The universal ff uses O/OXT and NH/NH2/NH3, so it's a non-issue.