Closed j-wags closed 1 year ago
To make this more tangible, here's a table of what would happen if we applied the name-X.Y-Z
versioning scheme right now. The left column has a list of the current ffxml
and offxml
files, the middle column shows which SMIRNOFF spec version they use, and the right column shows what they'd be renamed to under this scheme.
(repo / branch or tag) OFFXML file | Spec version | Proposed new name |
---|---|---|
(off/master) smirnoff99Frosst_0_0_2 | 0.2 | smirnoff99Frosst-0.2-? |
(off/master) smirnoff99Frosst_0_0_4 | 0.2 | smirnoff99Frosst-0.2-? |
(off/master) smirnoff99Frosst_0_0_4-fixed | 0.2 | smirnoff99Frosst-0.2-8 AKA “latest” |
(off/master) smirnoff99Frosst | 0.2 | smirnoff99Frosst-0.2-8 AKA “latest” |
(off/master) Frosst_AlkEthOH | 0.2 | Frosst_AlkEthOH-0.2-0 |
(off/master) Frosst_AlkEthOH_ParmAtFrosst | 0.2 | Frosst_AlkEthOH_ParmAtFrosst-0.2-0 |
(smirnoff99Frosst/1.0.0) smirff99Frosst.ffxml | 0.1 or “0.0”/pre-spec | smirnoff99Frosst-0.{1,0}-0.offxml |
(smirnoff99Frosst/1.0.X) smir{no}ff99Frosst.{o}ffxml | 0.1 or “0.0”/pre-spec | smirnoff99Frosst-0.1-X.offxml |
(smirnoff99Frosst/1.0.8) smirnoff99Frosst.offxml | 0.1 | smirnoff99Frosst-0.1-8.offxml |
Tagging @jchodera @davidlmobley @leeping @andrrizzi @bannanc @slochower for their thoughts. I'm happy to iterate on this a bit.
I'm a bit confused: You mention name-X.Y.Z
, but your proposed new names are name-X.Y-Z
.
My mistake. I meant to put name-X.Y-Z
. Will edit the original post. Thanks for catching that.
Given the above possibilities, I propose we use the versioning scheme name-X.Y-Z, where X and Y are the SMIRNOFF spec version (so 0.2 for this previous Monday's release)
Isn't this already introducing ambiguity because today's release is named 0.2.0
not 0.2
, unless I'm misunderstanding something here. I think we want to minimize confusion, so if we tell people today's release is 0.2.0
and then say that the files will be named based on the SMIRNOFF specification and science release, then I would expect the exact string to be in the file name, not a truncated version 0.2
. Right?
Edit: Whoops, I've already made a mistake that this naming should help to clarify. Right. I conflated the toolkit with the SMIRNOFF specification. The reason I did this is because I was -- implicitly -- considering the linkage between toolkit and specification version. For example, old toolkits probably won't be able to read a new SMIRNOFF specification, right? So shouldn't there also be a "this file is supported with toolkits > X.Y.Z
"?
I was also thinking about this because I've been seeing all the tweets about the toolkit include code snippets referring to (unversioned) smirnoff99Frosst.offxml
. Is the plan to distribute only smirnoff99Frosst-latest.offxml
with the toolkit? Or no force field file at all?
So shouldn't there also be a "this file is supported with toolkits > X.Y.Z"?
We have the mirror of this solution implemented, which is "this toolkit supports spec versions up to X."
We could consider how to make this information more prominent, since that might make it clearer to users.
Is the plan to distribute only smirnoff99Frosst-latest.offxml with the toolkit? Or no force field file at all?
In the long run, we don't want to distribute "latest" forcefields with the toolkit. We want any offxml
s that ship with the toolkit to be very clearly marked as "for testing purposes only". The current plan for this is making them only accessible via paths that acknowledge their "non production" status, likeForceField('test_forcefields/smirnoff99Frosst-0.2.8.offxml')
Okay, great! So just checking that the specification will only be formatted as X.Y
and science can only go 0-9
unless you zero-pad Z
(not in your examples above).
I mean you don't have to zero pad, but it'll make sorting work better :)
In the long run, we don't want to distribute "latest" forcefields with the toolkit. We want any offxmls that ship with the toolkit to be very clearly marked as "for testing purposes only". The current plan for this is making them only accessible via paths that acknowledge their "non production" status, likeForceField('test_forcefields/smirnoff99Frosst-0.2.8.offxml')
To be clear: the goal is to have users clone the smirnoff99Frosst
repository somewhere, pulling in all force field versions/files, and then have them point ForceField()
to the directory where they've cloned the smirnoff99Frosst
repository?
(I feel like the tweets are already training users to use a force field distributed with the toolkit that they will just keep using as templates as they build up more and more complex examples. So when the path changes to test_forcefields
, they'll just update their string. I could be wrong though.)
So just checking that the specification will only be formatted as X.Y and science can only go 0-9 unless you zero-pad Z (not in your examples above).
Correct about X any Y. I could go either way on zero-padding Z. Right now I'd prefer not to, and just let sorting be a little difficult once we go past 9.
To be clear: the goal is to have users clone the smirnoff99Frosst repository somewhere, pulling in all force field versions/files, and then have them point ForceField() to the directory where they've cloned the smirnoff99Frosst repository?
Future plans are to makesmirnoff99Frosst
conda-installable, and add some features to grant programmatic access to its data. Specifically, we can expose a data directory or entry point, which will let us pull forcefield files from the conda environment.
Future plans are to makesmirnoff99Frosst conda-installable, and add some features to grant programmatic access to its data. Specifically, we can expose a data directory or entry point, which will let us pull forcefield files from the conda environment.
Is the entry point going to be forcefields/
?
Is the entry point going to be
forcefields/
?
It could be. I was thinking that, to the user, the command would just look like ff=ForceField("name-X.Y-Z")
(with no folder). The ForceField's parse_sources
function would include logic to see if the desired file is available through the entrypoint.
I don't have much input on the versioning other than that I agree with where this is going.
One side point though that this reminds me of:
Future plans are to makesmirnoff99Frosst conda-installable, and add some features to grant programmatic access to its data.
Semantic note: We'll also (as soon as we begin refitting) use a name other than smirnoff99Frosst
since it'll be moving away from the parm99 roots. We'll still have versions of smirnoff99Frosst
but they would be just bugfix/maintenance releases, not significant new improvements. Substantially improved force fields will be under a different name/repo (which would also be conda-installable, etc.).
I was thinking that, to the user, the command would just look like ff=ForceField("name-X.Y-Z") (with no folder).
So currently ForceField("smirnoff99Frosst.offxml")
points to an included force field that's basically -latest
and will be moved into test_forcefields/
but after you install the new smirnoff99Frosst
with entry_point
, ForceField("smirnoff99Frosst.offxml")
will point outside the repository. So the behavior of the command will change in the near future?
@slochower yes, in some sense of the word "behavior". From my perspective the command stays the same and it will just change how it searches.
We may need to rethink how the "latest" forcefield will be found. Letting there be static copies of a file called "latest" may be a recipe for trouble (For example, what if someone last updated their repo two years ago? Then their "latest" forcefield would be out of date)
We could resolve this by having no unversioned files. That would makeForceField("smirnoff99Frosst.offxml")
return a file-not-found error.
If we really want to be able to load a "latest" forcefield (and I'm not convinced that we do), we could have ForceField("smirnoff99Frosst-latest.offxml")
ping some source on the internet to figure out what number "latest" corresponds to. Then it could instruct the user to conda update
if they're behind.
Having said that, even this strategy comes with some problems. For example, if a user parameterized a system using latest
a while ago, with an environment that may have been updated or changed in the meantime, then they won't know how to report the FF version in a publication.
I don't think we want to include the SMIRNOFF spec version in the filenames. That's a technical detail---you can in principle have the SAME force field encoded in SMIRNOFF spec 0.1 or 0.2 and produce exactly the same energies. Or the same content could be encoded in 0.2 and 0.3 and not change (except for the spec number) because the force field does not use new features in SMIRNOFF spec 0.3.
What matters is the science versioning. I think we should just name files "smirnoff99Frosst-X.Y.Z"
where
X
denotes we've made some major change in which functional formY
is the parameterization epoch / generationZ
is a bugfix version---e.g. something we've caught and corrected via the force field strike teamThe encoding spec will be defined inside the file. You'll get an Exception if you try to use software that is too old to read it, and that is easily remedied by using a more recent version of the toolkit. There is no chance for accidental human cognitive error, so there is no need to try to avoid human cognitive errors by encoding the spec version in the filename. We're just trying to avoid the human cognitive errors of
test/
path)My original perspective was that we should just always have force field version numbers in the file name. I think I still agree with this; having latest
just seems too dangerous. I agree with the fact that it could therefore be painful to update code to use the latest version, but this seems preferable to accidentally using an unintended version or accidentally changing version mid-project.
I also agree with John that we probably don't need the spec version encoded in the path name.
A clarification, I think that old/smirnoff99Frosst_0_0_4_fixed.offxml
doesn't have the same "scientific version" of the current smirnoff99Frosst.offxml
. See for example parameters b85
and a14
.
Sorry for the slow reply, I'm trying to make sure I'm thinking through this fully.
you can in principle have the SAME force field encoded in SMIRNOFF spec 0.1 or 0.2 and produce exactly the same energies.
I'm trying to reconcile this with point 2)
in my original post on this issue. We do have two versions of smirnoff99Frosst version "9". The 0.1 spec version does not specify PME electrostatics. The 0.2 spec version does specify PME electrostatics. So they're derived from a common ancestor, and they're matching 1:1 on bugfixes, so we could say they're "kind of the same". However, one specifies a long-range electrostatics method, and the other one doesn't, so it's possible that they generate systems where energy does not match. Based on this, I'd think that the versioning scheme we settle on should assign them different filenames.
I could see two solutions to this.
One I'll call "serial" versioning, where we say "everything in the smirnoff99Frosst repo was version 1-9 of our forcefield. Everything in the current OFF toolkit repo is compatible with spec 0.2, and is therefore version 10+. If we go the "serial" route, we'd do it using the X.Y.Z
versioning John suggested.
The other I'll call "parallel" versioning. This is where the forcefield version would include some sort of reference to its SMIRNOFF spec version, similar to my original suggestion. Under this scheme, everything from the smirnoff99Frosst repo would become 0.1.0 - 0.1.8. Versions of smirnoff99Frosst in the toolkit repo would become 0.2.0 - 0.2.8 (probably with gaps, since we don't match 1:1)
========
We could also decide that reverse-compatibility of our versioning scheme isn't essential (in fact, "renaming" the smirnoff99Frosst FF's will be some trouble, and we can decide how to handle that separately)
However, I think we will have parallel versions of one FF in two different spec versions again. For example, it's likely that we'd have both versions around for testing during a spec increment.
Sorry for the slow reply, I didn't see the tag until this morning. Jeff and I have been talking about this a lot so I thought I would just add how I've been thinking about it.
Something I've been concerned about is that we specify which version of the SMIRNOFF spect to use inside the smirnoff99frosst file so if we have smirnoff99frosst in parameter version 1.0.8 there could be two copies of this file one that tells the toolkit to use spec version 0.1.0 and one that tells the toolkit to use 0.2.0. Is it possible that having a different spec inside the 1.0.8 parameter file would affect my scientific results?
As we have new versions of the spec are they always going to be backwards compatible? so I only have SMIRNOFF99Frosst-1.0.8 will I be able to use that file in any future version of the openforcefield toolkit?
I imagine we could have force field file names with the form [name]-[FF version]-[spec version] so smirnoff99frosst-1.0.8-0.2.0.offxml so that you know which one you're getting in terms of the spec and force field.
@jchodera This part doesn't really make sense to me:
What matters is the science versioning. I think we should just name files "smirnoff99Frosst-X.Y.Z" where
- X denotes we've made some major change in which functional form
- Y is the parameterization epoch / generation
- Z is a bugfix version---e.g. something we've caught and corrected via the force field strike team
As soon as we move away from parm99/parm@frosst centered parameters we plan to start a new force field name. Do you foresee a future where the same forcefield name has different versions with different functional forms?
My original perspective was that we should just always have force field version numbers in the file name. I think I still agree with this; having latest just seems too dangerous. I agree with the fact that it could therefore be painful to update code to use the latest version, but this seems preferable to accidentally using an unintended version or accidentally changing version mid-project.
We discussed several possible use cases before. If you're doing reproducible science, you want to specify a specific (science) version of the force field. But, let's face it, most of our partners are going to be using this force field to perform quick calculations in service of a project they are supporting, and only care about using the latest, most accurate force field. If we tell them "every time we update our force field to make it better, you have to rework all your code to use new version numbers", they will hate us. The latest
science version serves a legitimate and critical purpose for our major user community.
I imagine we could have force field file names with the form [name]-[FF version]-[spec version] so smirnoff99frosst-1.0.8-0.2.0.offxml so that you know which one you're getting in terms of the spec and force field.
I think this makes the problem even more confusing, and works in the direction opposite of avoiding cognitive errors.
We want to avoid errors in which the wrong science version is used if you want reproducible science results, hence we want to include the science version in the file name. The software can't guard against this because it doesn't know what the user intends, so we have to take an explicit step to make sure the user is aware of an explicit science version choice they are making.
If we want to also avoid errors in which an incompatible spec version is used, we don't need to do anything since the software will raise an Exception in this case. The user then simply has to upgrade to the appropriate software version for the file to work.
We do not want to include both version numbers in the filename because that would attempt to impose the same solution on two distinct problems, creating yet a new problem about "which version number is which thing"?
Regarding @j-wags point about compatibility of the SMIRNOFF spec versions:
For me, the major question is really whether we have global or local SMIRNOFF spec versions. OpenMM's serialized System.xml, for example, has per-Force object versions, allowing the format for each individual handler to evolve independently. This might make everything more modular, since each parameter handler could then have versions that evolve independently.
Is it possible that having a different spec inside the 1.0.8 parameter file would affect my scientific results?
I think the answer to this is yes, due to undefined behavior. For example, you could simulate the 1.0.8 parameter set with direct electrostatics handled using a cutoff, using specification 0.1, and get answers that are different from using the same parameter set 1.0.8 with PME, as specified in version 0.2. So in principle, I think this can happen. I'm not sure it is a big deal, but I think it can happen.
We want to avoid errors in which the wrong science version is used if you want reproducible science results, hence we want to include the science version in the file name.
I agree with this; we definitely want to include the science version in the filename. But, I think, given that the science version is not sufficient to understand how the results are calculated (like above, where PME may or may not have been used), I think it's also useful to have the specification in the filename. While this is unwieldy, I think it will help understanding code snippets and reduce the cognitive load, to my eyes. If I see ForceField("smirnoff99Frosst-0.2-8.offxml")
in a notebook, I know more about the system than if I see ForceField("smirnoff99Frosst-8.offxml")
. Right?
For example, if we add a new tag to SMIRNOFF 0.3, future toolkit versions can read both SMIRNOFF 0.2 and 0.3.
We probably all agree this is a good idea. My question: is the science versioning going to be tied to SMIRNOFF versioning, for example, could force field version 2.0.0 only be compatible with specification 0.3, because force field version 2.0.0 uses features that are undefined in specifications before 0.3? In other words, does there need to be a "don't use this force field before SMIRNOFF version X.Y" or "only use this force field with specification X.Y" string encoded somewhere (distinct from which SMIRNOFF versions the toolkit understands)?
For me, the major question is really whether we have global or local SMIRNOFF spec versions. OpenMM's serialized System.xml, for example, has per-Force object versions, allowing the format for each individual handler to evolve independently.
I don't know much about how OpenMM does it, but I think that could be fine. Although, then, I think it will be difficult to map between filenames (if we only include the science version in the file name) and simulation behavior unless there is a 1:1 mapping between the science version and the specification version.
I think the answer to this is yes, due to undefined behavior. For example, you could simulate the 1.0.8 parameter set with direct electrostatics handled using a cutoff, using specification 0.1, and get answers that are different from using the same parameter set 1.0.8 with PME, as specified in version 0.2. So in principle, I think this can happen. I'm not sure it is a big deal, but I think it can happen.
I don't think we should consider this possibility. Any time the spec changes the behavior of an existing parameter block, we will not be backwards compatible, so the code should throw an exception if it cannot support that version.
However, we can certainly build scientifically identical versions of the 1.0.8 parameter set for SMIRNOFF 0.2 and 0.1.
SMIRNOFF 0.1 was our first attempt and more of a proof of concept. Now that we have codified the idea that the SMIRNOFF 0.2 and onward formats encode everything needed to compute the energy unambiguously, the concerns about which SMIRNOFF spec you are using should be purely technical going forward.
My question: is the science versioning going to be tied to SMIRNOFF versioning, for example, could force field version 2.0.0 only be compatible with specification 0.3, because force field version 2.0.0 uses features that are undefined in specifications before 0.3? In other words, does there need to be a "don't use this force field before SMIRNOFF version X.Y" or "only use this force field with specification X.Y" string encoded somewhere (distinct from which SMIRNOFF versions the toolkit understands)?
In my proposal in https://github.com/openforcefield/openforcefield/issues/249#issuecomment-481463436, incrementing the major version of the science versioning could indicate that we need to use some new functional form in the SMIRNOFF spec, but we don't necessarily need to tie the numbers together between science and SMIRNOFF. We might even decide that dates are more useful for the force field releases, rather than version numbers.
I agree with John here. Conceivably we could have the same version of the FF (science) available for different toolkit versions, and we don't necessarily need the filename to indicate that, as long as older toolkit versions will refuse to run FFs using a newer spec.
Conceivably we could have the same version of the FF (science) available for different toolkit versions, and we don't necessarily need the filename to indicate that
On a practical level, I think we do want filename to indicate that. Generally, I want to avoid situations where we have identical filenames with different contents. If we permitted different files to have the same filename, that would constrain the structure of our central forcefield repo, since we couldn't put "all the forcefields" in a single folder (due to naming collisions). We could resolve this by having each forcefield live in a folder according to their SMIRNOFF spec, at which point they'd be accessed by spec-0.2/smirnoff99Frosst-1.0.8.offxml
. But that basically puts the spec in the filename (or, path) anyway.
Some times, we will have to break the spec, as we did between SMIRNOFF 0.1 and 0.2. That's fine, it will happen, especially early on.
This is the main area where I'm concerned. As we add new tags to the spec, each one comes with the chance of introducing ambiguity that is removed in a later spec update.
If we couple SMIRNOFF version to the filename, any time we catch a mistake, we can heal the provenance of work done by making the statement "smirnoff99Frosst 0.1.8 and 0.2.8 were not actually the same forcefield." and generating a DOI for it.
However, if it's not clear which spec version a forcefield may have contained, then clearing up the same issue becomes really difficult.
I see your point about accidentally leaving ambiguity that is resolved later (though this is presumably something we want to resolve within the spec version since that would be a bug, not a feature!). But picturing the evolution of the spec over a few iterations here---it seems like it will be incredibly exhausting to try to keep a spec version in the filenames. How do we update them? Do we upgrade them all every time we release a spec bugfix update? Or a minor version update to the spec? I can't even wrap my head around how the migration process might work.
it seems like it will be incredibly exhausting to try to keep a spec version in the filenames.
That's a good point. This would add a lot of friction.
How about we go forward with @jchodera 's idea of X.Y.Z, but only release the "official" version of each forcefield coupled to one SMIRNOFF spec version? I think that would address all of my concerns.
For example, forcefield 1.2.3 would "officially" use spec 1.7. That way, if there's a problem with the spec, we release forcefield 1.2.4 which uses spec 1.8.
For me, the major question is really whether we have global or local SMIRNOFF spec versions. OpenMM's serialized System.xml, for example, has per-Force object versions, allowing the format for each individual handler to evolve independently.
I'd also be open to per-parameter-block spec versions. In fact, that might be nice, as we might expand in a few years to support dozens of tags, and it would be cumbersome to keep updating the entire spec version due to updates/bugfixes in some rarely-used tag.
@j-wags that sounds like a good plan to me.
This sounds like a good idea to me.
And I do like the idea of per-parameter-block spec versions. This has worked well for OpenMM, so I think the idea is sound.
Ok, so here's my proposal for how to transition to this versioning scheme and a OFF FF repo as part of the 0.3.0 release. I'd appreciate any feedback, or 👍 if this plan looks like it's ready to implement.
name-X.Y.Z
X denotes we've made some major change in which functional form Y is the parameterization epoch / generation Z is a bugfix version---e.g. something we've caught and corrected via the force field strike team (Basically John's suggestion above)
Currently, we are working on the smirnoff99Frosst
line of force fields. The scope of this line of forcefields is to produce SMIRKS that replicate parm99/parm@frosst.
As soon as we move away from trying to faithfully reproduce parm99/parm@frosst, we'll need to come up with a new name
(ideas?)
SMIRNOFF data sources will no longer have a "global" SMIRNOFF spec version at the top of the file. Instead they will have "SMIRNOFF section" versions in each section (Bonds
version, Angles
version, etc).
At this moment, the "newest" SMIRNOFF section version for all existing sections is 0.2
If a SMIRNOFF spec version is found at the top of a file (the old way of versioning), all sections in that file will be assumed to NOT have a spec version in their header, and will be assumed to have the spec version at the top of the file
The only required top-level tag in a SMIRNOFF file will be aromaticity_model
, as that affects how all parameter sections do their SMIRKS matching.
The Open Force Field toolkit will be updated to have each of its ParameterHandler
s know the minimum and maximum SMIRNOFF section version it can read.
If a SMIRNOFF section is found to be "forward incompatible", and is then attempted to be read by a newer ParameterHandler
version, either
openforcefield-forcefields
repoA copy of each FF file in the table below will be migrated to openforcefield/openforcefield-forcefields
. This will be a flat repo (minimal directory structure) with all FFs in it.
X.Y.Z
versioning system. This isn't technically the right thing to do, but it clears up name collisions. (see table below)With the exception of a -latest
forcefield for each unique name, all forcefield files committed to this repo will have a version number
Once included in a tagged release in this repo, files are never renamed or modified, except the -latest
files (can we set up automation to enforce this?)
Official versions of this repo will be cut each time a new version of at least one forcefield is pushed.
openforcefield/openforcefield-forcefields
repo:To ensure that this "forcefield repo" idea makes sense, I started drafting a "release checklist" for how new FFs would be published here.
Ensure that the "SMIRNOFF section" version in any new files is the most current, or record the reason it isn't
If a new spec version is being committed, the toolkit must be tested with at least one production FF ported to the new spec (what would these tests entail?)
Update -latest
version of any updated forcefield series (should these be hard copies, or links?)
-latest
really be a local file, or should it always do some sort of internet query to determine the "true" latest version?)Get approval of at least two people, one of whom is an OFF GitHub Admin
Fill in release notes template, detailing which files were added, and the process by which they were created / migrated (we'll need to make a release notes template)
Generate Zenodo DOI for release (Or DOI for each new forcefield file? Could be accomplished by requiring each file addition to be a separate release)
openforcefield/openforcefield
repo to openforcefield/openforcefield-toolkit
0.3.0-branch
branch
openforcefield-toolkit
openforcefield-toolkit
openforcefield/openforcefield-toolkit
repo are given X.Y.Z
version numbers (we'll need to match these up to s99F
repo releases)test_data/smirnoff99Frosst-1.0.X
smirnoff99Frosst
FF with the "per section" versioning scheme for testing (called smirnoff99Frosst-1.2.9
)openforcefield/smirnoff99Frosst
repo gets a [DEPRECATED] warning put on it, but is kept open with a README update explaining the migration and pointing to openforcefield/openforcefield-forcefields
openforcefield/smirnoff99Frosst
repo is no longer actively updated, and all subsequent S99F
forcefield versions get added to openforcefeld/openforcefield-forcefields
(This isn't a hard requirement, but I just figure it'll be easier to not have multiple "golden" copies of files floating around)openforcefield/openforcefield-forcefields
repo and migrate files there, naming them according to the table below.openforcefield/openforcefield-forcefields
repo, and release it on omniaopenforcefield-forcefields
as dependency for openforcefield-toolkit
, working in 0.3.0-branch
branch. Tests to ensure forcefield files there are reachable.?
s must be mapped to S99F
releases)Current filename "([repo]/[branch or tag]) [filename]" | Proposed filename (in openforcefield-forcefields ) |
Whole-file SMIRNOFF | SMIRNOFF section version |
---|---|---|---|
(off/master) smirnoff99Frosst_0_0_2 | smirnoff99Frosst-1.1.?.offxml | 0.2 | N/A |
(off/master) smirnoff99Frosst_0_0_4 | smirnoff99Frosst-1.1.?.offxml | 0.2 | N/A |
(off/master) smirnoff99Frosst_0_0_4-fixed | smirnoff99Frosst-1.1.?.offxml | 0.2 | N/A |
(off/master) data/forcefield/smirnoff99Frosst | smirnoff99Frosst-1.1.8.offxml | 0.2 | N/A |
latest in 0.2 spec w/ H mass fix | smirnoff99Frosst-1.1.9.offxml | 0.2 | N/A |
latest in 0.3 spec w/ H mass fix | smirnoff99Frosst-1.2.9.offxml | N/A | 0.2 |
(off/master) utilities/convert_frosst/smirnoff99Frosst | smirnoff99Frosst-1.0.?.offxml | 0.2 | N/A |
(off/master) Frosst_AlkEthOH | Frosst_AlkEthOH-1.0.0.offxml | 0.2 | N/A |
(off/master) Frosst_AlkEthOH_ParmAtFrosst | Frosst_AlkEthOH_ParmAtFrosst-1.0.0.offxml | 0.2 | N/A |
(smirnoff99Frosst/1.0.0) smirff99Frosst.ffxml | smirnoff99Frosst-1.0.0.offxml | 0.1 or “0.0”/pre-spec | N/A |
(smirnoff99Frosst/1.0.X) smirff99Frosst.ffxml | smirnoff99Frosst-1.0.X.offxml | 0.1 or “0.0”/pre-spec | N/A |
(smirnoff99Frosst/1.0.8) smirnoff99Frosst.offxml | smirnoff99Frosst-1.0.8.offxml | 0.1 | N/A |
(smirnoff99Frosst/1.0.9) smirnoff99Frosst.offxml (H mass fix) | smirnoff99Frosst-1.0.9.offxml | 0.1 | N/A |
Seems like a great plan to me! A couple of comments
SMIRNOFF data sources will no longer have a "global" SMIRNOFF spec version at the top of the file.
We might need to keep this if we want to keep the door open to changes outside the parameter tags (e.g. add new options besides aromaticity_model
or new metadata tags like <Date>
).
should
-latest
be hard copies, or links? Should-latest
really be a local file, or should it always do some sort of internet query to determine the "true" latest version?
I like the link idea. We may run into problems if the file is stored only remotely in systems that don't have access to external networks. Also, if this file is the file of a repo, people may accidentally parametrize the system with a temporarily "bugged" force field that will need corrections before being released.
Since our first series of improved force field objectives is to simply improve the parameters associated with smirnoff99Frosst 1.0.z, I have no problems calling these force fields smirnoff99Frosst-x.y.z
for the foreseeable future.
Once we adopt a different strategy for coming up with SMARTS types, I think we can start to migrate away from the smirnoff99Frosst
name.
For example, in future, we could have families of force fields that share a prefix, such as openforcefield-fixedcharge-1.1.0
would match openforcefield-polarizable-1.1.0
in that they derived from the same data but featured different electrostatics models.
@jchodera @j-wags - relating to smirnoff99Frosst
naming, Chris's perspective (which I agree with) is that we ought to move away from the 99Frosst
naming as soon as we begin any significant refitting; I tend to agree with him. This was a nod to the "parm@Frosst" heritage but we've already diverged from that significantly and once we refit to any significant extent, the parm@Frosst heritage will essentially be gone. Also the 99
in the name will read like "1999" to people familiar with AMBER-family force fields which may be bad branding since that's 20 years ago. I don't have a great alternate name though, but perhaps we should throw that open on Slack for the acronym-savvy to have at it. :)
@j-wags I agree with most of your proposal, though on this:
Should -latest really be a local file, or should it always do some sort of internet query to determine the "true" latest version?)
I still think having a file that has latest
in its name is a terribly bad idea since people will assume it is latest even when it has gone out of date. I could see this triggering an internet look-up perhaps, or a link is also acceptable. I can see how having something like this would help with maintaining the examples, but we also don't want to have to devote much time to supporting it.
It also seems like you have too many force fields in your migration table, or perhaps the latest changes have essentially created too many files. Currently my view of what we OUGHT to have is:
smirnof99Frosst
repo, which SHOULD (but perhaps aren't??) be using latest spec, or should be updated to use itoff
repo under utilities/convert_frosst
. These should be the same!! If we catch a human error in a SMIRKS pattern or similar, this is still where we would make an update in the source files there and then re-generate the ffxml files from it, so we want to ensure these are current until we actually have done some refitting.We should actually be able to remove a number of the various AlkEthOH variants except those utilized in tests/examples. I had some additional ones (e.g. Frosst_AlkEthOH_ParmAtFrosst
I believe) there for testing/paper-writing purposes but these have no functionality utility anymore unless they are utilized in the tests.
Otherwise I think this looks good.
Instead of a latest
file, we can have the force field repo install an importable variable name or function that contains the path to the latest force field version. That would explicitly avoid having a file named latest
.
SMIRNOFF data sources will no longer have a "global" SMIRNOFF spec version at the top of the file.
We might need to keep this if we want to keep the door open to changes outside the parameter tags (e.g. add new options besides aromaticity_model or new metadata tags like
).
Good point, @andrrizzi. After a bit of thought, I think you're totally right. I'll keep a top-level SMIRNOFF version in these files as well.
@davidlmobley
It also seems like you have too many force fields in your migration table, or perhaps the latest changes have essentially created too many files. Currently my view of what we OUGHT to have is:
- All releases in smirnof99Frosst repo, which SHOULD (but perhaps aren't??) be using latest spec, or should be updated to use it
The files in the openforcefield/smirnoff99Frosst
repo aren't using the current spec. The newest one there uses the 0.1 spec, and some of the older ones use something else. We can update them, but then they will encode different science (since they'll have specified an electrostatics method). This is why I think we need to have the separate smirnoff99Frosst-1.0.X
and smirnoff99Frosst-1.1.X
series of OFFXMLs.
- Source files for these in the off repo under utilities/convert_frosst. These should be the same!! If we catch a human error in a SMIRKS pattern or similar, this is still where we would make an update in the source files there and then re-generate the ffxml files from it, so we want to ensure these are current until we actually have done some refitting.
Ah, I didn't previously understand this process. convert_frosst/convert_frcmod.py
has indeed fallen behind, and does not generate 0.2 spec files. I've made issue #293 for this and assigned it to the 0.3.1 milestone so it will stop falling through the cracks.
Since we'll have all FF versions present in the openforcefield/openforcefield-forcefields
, would we have a frcmod source file for each smirnoff99Frosst-X.Y.Z
file? This would make sense, up until we have a spec version change, at which point the conversion script needed to create the OFFXML would change, and that should somehow be recorded. Maybe that kind of detail could be in the OFFXML comments, or mentioned in the release notes when the new FF comes out.
- Various AlkEthOH variants
We should actually be able to remove a number of the various AlkEthOH variants except those utilized in tests/examples. I had some additional ones (e.g. Frosst_AlkEthOH_ParmAtFrosst I believe) there for testing/paper-writing purposes but these have no functionality utility anymore unless they are utilized in the tests.
Agreed
@j-wags :
The files in the openforcefield/smirnoff99Frosst repo aren't using the current spec. The newest one there uses the 0.1 spec, and some of the older ones use something else. We can update them, but then they will encode different science (since they'll have specified an electrostatics method). This is why I think we need to have the separate smirnoff99Frosst-1.0.X and smirnoff99Frosst-1.1.X series of OFFXMLs.
I don't understand, what's the difference between 1.0.x an 1.1.x at this point? And why would those force fields in the smirnoff99Frosst repo have a specified electrostatics method and others won't?
Since we'll have all FF versions present in the openforcefield/openforcefield-forcefields, would we have a frcmod source file for each smirnoff99Frosst-X.Y.Z file? This would make sense, up until we have a spec version change, at which point the conversion script needed to create the OFFXML would change, and that should somehow be recorded. Maybe that kind of detail could be in the OFFXML comments, or mentioned in the release notes when the new FF comes out.
I think as soon as we actually refit any parameters, we should stop using the frcmod
source file as a source file.
At this point, the ONLY changes between versions have been to the spec (at least, that's how I summarize them) or manual fixes to specific parameters in the frcmod file, so it's made sense to me to continue doing things this way.
I'm open to suggestions though.
what's the difference between 1.0.x and 1.1.x at this point?
For example, here's the file I will rename to smirnoff99Frosst-1.1.8.offxml
from the openforcefield/openforcefield
toolkit repo's 0.2.2 release. In that file, the long range electrostatics method is defined here.
Conversely, here's the file I will rename to smirnoff99Frosst-1.0.8.offxml
, from the openforcefield/smirnoff99Frosst
repo's 1.0.8 release. At the time this was created, the long range electrostatics method wasn't part of the spec.
I agree that's functionally somewhat important, especially once people start using that part of the spec. But... it doesn't actually involve differences in parameters or any functional differences at this point.
My general rule here for when force field files should be considered "different" is "could it produce a different energy?" In this case, someone could have taken the file I'm proposing to version as 1.0.8, and done a periodic simulation with an electrostatics cutoff, in which case they would get different energies than if they took 1.1.8 and did a periodic simulation, which would have automatically set them to use PME.
Is this resolved, or still under consideration? If it is, I will move this into a Discussion. If not, I will close it.
I have to ask since it'd take me about 20-30 minutes to read and understand everything and I'm not sure what has already been implemented. If it's unclear I would simply advocate for closing without investing that effort.
Let's close and then reopen if we need it later.
Relevant reading: https://github.com/openforcefield/openforcefield/issues/5
We're working on versioning all unversioned
smirnoff99Frosst
files from the repo. This requires putting a version on loose forcefield files. We should be careful with what we choose, because this will probably set a precedent that we'll stick to for a while.Here are some complicated versioning situations I anticipate we could run into: 1) Between one FF version and the next, only the SMIRNOFF spec version changes in the OFFXML text. This changes the science because the new spec adds some keywords with default values that the old one didn't have. 2) The same FF is represented in two different SMIRNOFF specs. This may happen during spec version increments, like right now (we have
smirnoff99Frosst.offxml
"science version" 9 in both the 0.1 and 0.2 spec, with indistinguishable filenames)Given the above possibilities, I propose we use the versioning scheme ~
name-X.Y.Z
~name-X.Y-Z
, whereX
andY
are the SMIRNOFF spec version (so 0.2 for this previous Monday's release), andZ
is the "science" version of the forcefield. Therefore,Z
will always be incremented when we come up with better bond parameters or otherwise make a change that affects the energy of the system.In several years, this versioning scheme means we could end up with filenames like
openFF-3.5-91
, which doesn't seem too weird.