openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
311 stars 91 forks source link

Stronger typing for residue-related atom metadata in to_openmm_topology #1722

Open timbernat opened 1 year ago

timbernat commented 1 year ago

Description Explicit type checking / conversion of residue fields (such as "residue_number", "chain_id", etc) would be useful for avoiding issues when interfacing with OpenMM and OpenFF.

As a real-world example, the "residue_name" key in an Atom's metadata dict could likely be set to an int (sensibly enough), such as in openff.toolkit.topology.Topology.from_openmm Ln 1498-1500. This poses an issue when attempting to write a PDB file with unaltered residues using OpenMM's PDBFile object with the "keepIds" keyword. OpenMM expects the "id" attribute of Residue objects to be a string, raising a TypeError which prevents residue numbers from being faithfully preserved. There may be other instances where this type specificity may become important, this is just the one I've encountered which has caused much difficulty and debugging frustration.

j-wags commented 1 year ago

Thanks for the feedback @timbernat. I've migrated this to openff-toolkit since I think it should be resolved in Topology.to_openmm

Since we need to be ready to interoperate with more than OpenMM, I'd prefer to keep the OpenFF Atom metadata dict open to ints and strings. However, I'd be open to casting the residue identifiers to strings and ints as appropriate in the to_openmm method itself.

I'm very busy right now but would be happy to accept a PR with a fix+test for this!

mattwthompson commented 1 year ago

A bit of context, which maybe you've figured out already: Interchange.topology is just a openff.toolkit.Topology object, so whatever changes would indeed need to go into the toolkit.

IIUC, your case involves openmm.app.Topology accepting ints for a field that openmm.app.PDBFile expects to be str? Strictly speaking that sounds like a quirk of OpenMM, though I figure there's good reason for that inconsistency.

timbernat commented 1 year ago

Strictly speaking that sounds like a quirk of OpenMM, though I figure there's good reason for that inconsistency.

I agree, it seems like an odd choice but I'm sure there's good motivation behind it (per Chesterton's Fence), that's mainly why I filed as an OpenFF rather than an OpenMM issue.

However, I'd be open to casting the residue identifiers to strings and ints as appropriate in the to_openmm method itself. I'm very busy right now but would be happy to accept a PR with a fix+test for this!

That certainly seems like the least-invasive solution, I'm wrapping up some projects at the moment as well but I can circle back and put together a PR is the near-term, as labelled this isn't terribly high priority