Closed mattwthompson closed 2 years ago
A subtle detail to consider is that switching element packages would have a minor impact on atomic masses. Mendeleev cites (click through to ref 58) a 2017 report from IUPAC, but seems to truncate a few significant digits. OpenMM doesn't cite a specific source but seems to be using data from around the early 2000s. Not that it changes much or is so important, and I'm not interested in digging more into the revisions to find out.
This is certainly be a behavior change(DeepDiff
accurately catches this as a change before the sixth significant digit). Whether this qualifies as a regression or matters at all is maybe a matter of opinion. I don't care what we go with, but documenting the data source of the masses that end up in a users' objects might be worthwhile. If we went with our own light implementation we'd be able to control whether we want to match OpenMM's values or some year of IUPAC data.
import openmm.app
import mendeleev
print("Atomic number mendleev OpenMM")
for atomic_number in range(1, 20):
print(
atomic_number,
"\t",
f"{mendeleev.element(atomic_number).mass:15g}",
"\t",
openmm.app.element.Element.getByAtomicNumber(atomic_number).mass,
)
Atomic number mendleev OpenMM
1 1.008 1.007947 Da
2 4.0026 4.003 Da
3 6.94 6.9412 Da
4 9.01218 9.0121823 Da
5 10.81 10.8117 Da
6 12.011 12.01078 Da
7 14.007 14.00672 Da
8 15.999 15.99943 Da
9 18.9984 18.99840325 Da
10 20.1797 20.17976 Da
11 22.9898 22.989769282 Da
12 24.305 24.30506 Da
13 26.9815 26.98153868 Da
14 28.085 28.08553 Da
15 30.9738 30.9737622 Da
16 32.06 32.0655 Da
17 35.45 35.4532 Da
18 39.948 39.9481 Da
19 39.0983 39.09831 Da
I'm in support of having our own dicts of atomic_number
to mass
and symbol
. This will keep everything in our regression testing suite from blowing up if an external package changes some masses (even if the change is correct!). We could put it in openff-units
to keep our constants together.
For the values, it would be great if we could find an authoritative source that "coincidentally" agrees with OpenMM's current masses. This may seem silly, but the entire concept of doing atomistic simulations with "average by earth crust composition" masses is of questionable justification too, so I don't think there's a strong authority on this point anyway.
I think specific-isotope masses would be even better, but that would be a big deviation from the rest of the field, so I don't think it's a strictly infrastructure question, and we can discuss that in #974 separately.
Closed #1182, will land in v0.11.0 and the associated API breaks will be noted in the release notes and/or migration guide.
The current refactor switches out element handling from OpenMM to
mendeleev
, checks off a lot of our boxes - it has all of the features we need (and more), it's actively developed and well-documented, and it's available onconda-forge
. But it's also a little heavy at import time and with respect to simple conversions, i.e. atomic number -> element symbol. See https://github.com/openforcefield/openff-toolkit/pull/1177 for some timings after optimization. AFAICTThere's an easy tendency to want to pull in big libraries to do relatively simple things, which can be a poor match between maintainability/packaging/import times and features added. My counter-proposals, either of which I'd be happy to contribute
ele
. The author has recently left for industry, but the MoSDeF ecosystem uses it and I expect we'd be able to maintain it via contributions - or simply fork it and go from there.{atomic_number: symbol}
and{atomic_number: mass}
mappings. These would map between simple types (int, str, and int/quantity) and the toolkit would not explicitly have any objects representing elements. It would also allow us to keep the same values reported by OpenMM without importing it, whereas another unit package likely introduces small changes.My only strong preference is to get away from using
openmm.app.element
- it's a module that has worked great for our purposes but I would love our typical conda-based users to not need to install the CUDA toolkit to parse a SMILES string. Switching tomendeleev
satisfies this request, as does the other ideas above.Keep in mind we're already breaking the API of the
element
andmass
properties, so if we wanted to make a big change, now is the time to consider it.