Open mattwthompson opened 1 month ago
Something might be going on here, can't tell if this is related
ipdb> protein.atom(0).metadata["residue_number"]
'14'
ipdb> protein.residues[0].residue_number
'14'
ipdb> protein.to_topology().atom(0).metadata["residue_number"]
'14'
ipdb> protein.to_topology().residues[0].residue_number
'1'
Thanks for the great writeup. It's not a bug, just a poorly documented design.
The core issue is that Atom and Molecule/Topology-level metadata aren't kept in sync by default.
Changes to Atom-level metadata can be propagated to relevant HierarchyElements by running offmol.percieve_hierarchy
.
There's currently no way to propagate HierarchyElement metadata changes back to the underlying atoms.
Some options I could see are:
__setattr__
on a HierarchyElementperceive_hierarchy
call, have an attribute like _hierarchy_perceived
be set on it. Then if someone updates the atom's metadata after that, print a warning saying that they'll need to call perceive_hierarchy
again for those changes to be reflected in Molecule/Topology iteratorsoffmol.residues[0].residue_name = "foo"
). That's because this could easily land a molecule in a state where it has two residues with identical metadata, which violates the workings of HierarchyScheme (atoms with identical metadata all go into a single residue).Happy to take a PR for any of these or discuss further.
My low-effort before-coffee opinion on this now is that we should quickly document most of this behavior (~few days) before we make any decisions about major design changes (~weeks to several months)
FWIW, this is documented:
HierarchyScheme
class and Molecule.add_hierarchy_scheme()
method both say it explicitly with the wording quoted belowMolecule.add_default_hierarchy_schemes()
mentions itHierarchy schemes are not updated dynamically; if a Molecule with hierarchy schemes changes, Molecule.update_hierarchy_schemes() must be called before the scheme is iterated over again or else the grouping may be incorrect.
When I was writing the documentation for this feature, I did find it a bit difficult to decide where to put this information, because a lot of users will only interact with, say, Molecule.residues
, which can't be documented because it's not really a method. I kind of ended up just trying to put it everywhere. I'm a bit miffed that I didn't put it in Topology.hierarchy_iterator()
, because that seems like an obvious place that people might run into it. If you have any other suggestions for more places to mention this, or clearer ways to phrase it, I'm all ears!
I also agree that dynamic iterators would be much less surprising.
FWIW, I searched mostly for "metadata" since that's the way I was interacting with the API. I appreciate that this behavior is mentioned several times in those methods, but they're also methods I didn't need to interact with to get this behavior.
My only suggestion at the documentation level is to include it when Atom
s are described, i.e. here or a probably-copy-pasted-bit in Molecule.add_atom
:
A "Beware! This other thing ... see here" might have helped in this case.
Anyway, I think making the actual behavior less squishy would be more impactful than documenting it a fifth time over. The power and flexibility of this functionality has huge upside but I seem to keep shooting myself in the foot when I try to hook into it for tasks I naively think are straightforward
Describe the bug
Updating an atom's residue-relevant metadata does not update the same data on the corresponding residue object. Editing the residue also doesn't update the atom. I'm not sure if this is the intended behavior - if it is, there needs to be safeguards or at least documentation, and if it's not, this is a bug.
To Reproduce
Output
Computing environment (please complete the following information):
conda list
: https://gist.github.com/mattwthompson/087a81f4eb1a6bdb1900e1e287615f8bAdditional context
This behavior makes sense given what I know about the design, but I think it's reasonable for a user to expect that updating residue data on an atom might correspondingly update the residue itself. The
Atom.metadata
field is implied to do this sort of thing:If this is the intended behavior, it'd be nice to have it documented in some sort of "PDB loading cookbook" https://github.com/openforcefield/openff-toolkit/issues/1554