openforcefield / standards

A repository of the standards employed across the Open Force Field Consortium.
https://openforcefield.github.io/standards
MIT License
1 stars 3 forks source link

Change SMIRNOFF spec to include FF name in file contents? #12

Closed j-wags closed 2 years ago

j-wags commented 4 years ago

Is your feature request related to a problem? Please describe. @cbayly13 points out that people will very likely report the wrong version/FF name when using Parsley, as openff-1.0.0.offxml begins with

<?xml version="1.0" encoding="utf-8"?>
<SMIRNOFF version="0.3" aromaticity_model="OEAroModel_MDL">
    <!-- This is the initial "Parsley" force field released by the Open Force Field Initiative. -->
    <!-- The Parsley force field is available with and without bond constraints to hydrogen. -->
    <!-- This file contains bond constraints to hydrogen, and is suitable for long-timestep simulations. -->
    <Author>The Open Force Field Initiative</Author>
    <Date>2019-10-10</Date>
    <Constraints version="0.3">
        <!-- constrain all bonds to hydrogen to their equilibrium bond length -->
        <Constraint smirks="[#1:1]-[*:2]" id="c1"></Constraint>
    </Constraints>

So, we'll probably see some variants of "We used SMIRNOFF 0.3 for this simulation" in the future. To avoid this, it would be good to have some record of the FF name inside of the file contents.

One major constraint is that the ForceField object currently has no memory of the original file name, thus it couldn't return the FF name if it tried. And, moreso, it's perfectly valid to feed a large XML string containing an entire force field to the constructor, in which case the FF name definitely wouldn't be available.

Describe the solution you'd like

This will require some discussion. Some potential solutions are:

jchodera commented 4 years ago

This is a great idea! I like the first option, but calling it citable_name instead of name so that it's clear that we can use that name to refer to the force field. It should ideally be the first line in the force field file, and we should likely not include it for non-citable releases.

davidlmobley commented 4 years ago

I like John's idea.

Except I have a slight parsing problem with "citable_name" because my brain says, "What's a CI Table?". Maybe that's just me though. I feel like my brain would do better with "formal_name" or "forcefield_name" or "citation_name" or something.

mattwthompson commented 2 years ago

I transferred this over from the toolkit; even though it's two years old now, it's worth either considering it or shutting the door on it.

people will very likely report the wrong version/FF name when using Parsley, as openff-1.0.0.offxml begins with ... So, we'll probably see some variants of "We used SMIRNOFF 0.3 for this simulation" in the future.

Has this happened? I have only seen "OpenFF 1.0" at times squishier things like "the open force field version 1". As far as I've heard, this hasn't been a major issue in the force field releases that we've done. People presumably already know the version of the force field they used from the version of the openff-forcefields package they have installed or what they access in the ForceField constructor. People seem to be able to access to the version of the force field they use, and our force fields' names (or code names?) aren't really adding new information - do we need name attributes in the file and object?

There is also the problem of this information persisting in the ForceField object, which is (intentionally) malleable.

>>> from openff.toolkit.typing.engines.smirnoff import ForceField
>>> sage = ForceField("openff-2.0.0.offxml")
>>> sage['Bonds'].parameters[0].k
<Quantity(529.242972, 'kilocalorie / angstrom ** 2 / mole')>
>>> sage['Bonds'].parameters[0].k *= 2
>>> sage['Bonds'].parameters[0].k
<Quantity(1058.48594, 'kilocalorie / angstrom ** 2 / mole')>
>>> sage.citable_name

If this returned 2.0.0 or something, that would be bad.

It would also be bad if somebody wrote out this file to openff-2.0.0.offxml and distributed that file somewhere.

This seems like something that's trivial on the surface but runs into complexity when considering the details. If there isn't a huge need for this, I would rather this stay un-supported (as it has for however long now).

davidlmobley commented 2 years ago

This has not happened to my knowledge. I think since the file names are clear, people have not used the version in the XML tags as a force field version number. It also helps that these version numbers are rather distinct, e.g. we weren't using force field version 1.2 with spec version 1.1 or something similar. Anyone who MIGHT be tempted by this has probably noticed spec version 0.3 with force field version 1.x or 2.0 and interpreted it correctly.

I think I'm with you that this is complex to implement well AND not something which has caused problems so probably not a problem we need to fix, though I initially thought we should.

mattwthompson commented 2 years ago

I'm closing this as it seems there's not a compelling need to take action and nobody has taken up that mantle. (In my experience, a vanishingly small portion of users open up the OFFXML files directly, so the fact that our ForceField class sweeps the <?xml version="1.0" encoding="utf-8"?> line under the rug mostly sidesteps this issue.)

If this does become an issue in the future, people will likely come across it with the right search:

SMIRNOFF version vs. force field version vs. XML version vs. OFFXML version SMIRNOFF versioning vs. force field versioning vs. XML versioning vs. OFFXML versioning

jchodera commented 2 years ago

Won't this be a problem if we serialize force fields for transmission, or pickle their contents into objects? The filename is not guaranteed to be associated with them in any of these cases.