openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
318 stars 92 forks source link

Versioning scheme for forcefield files #249

Closed j-wags closed 1 year ago

j-wags commented 5 years ago

Relevant reading: https://github.com/openforcefield/openforcefield/issues/5

We're working on versioning all unversioned smirnoff99Frosst files from the repo. This requires putting a version on loose forcefield files. We should be careful with what we choose, because this will probably set a precedent that we'll stick to for a while.

Here are some complicated versioning situations I anticipate we could run into: 1) Between one FF version and the next, only the SMIRNOFF spec version changes in the OFFXML text. This changes the science because the new spec adds some keywords with default values that the old one didn't have. 2) The same FF is represented in two different SMIRNOFF specs. This may happen during spec version increments, like right now (we have smirnoff99Frosst.offxml "science version" 9 in both the 0.1 and 0.2 spec, with indistinguishable filenames)

Given the above possibilities, I propose we use the versioning scheme ~name-X.Y.Z~ name-X.Y-Z, where X and Y are the SMIRNOFF spec version (so 0.2 for this previous Monday's release), and Z is the "science" version of the forcefield. Therefore, Z will always be incremented when we come up with better bond parameters or otherwise make a change that affects the energy of the system.

In several years, this versioning scheme means we could end up with filenames like openFF-3.5-91, which doesn't seem too weird.

j-wags commented 5 years ago

To make this more tangible, here's a table of what would happen if we applied the name-X.Y-Z versioning scheme right now. The left column has a list of the current ffxml and offxml files, the middle column shows which SMIRNOFF spec version they use, and the right column shows what they'd be renamed to under this scheme.

(repo / branch or tag) OFFXML file Spec version Proposed new name
(off/master) smirnoff99Frosst_0_0_2 0.2 smirnoff99Frosst-0.2-?
(off/master) smirnoff99Frosst_0_0_4 0.2 smirnoff99Frosst-0.2-?
(off/master) smirnoff99Frosst_0_0_4-fixed 0.2 smirnoff99Frosst-0.2-8 AKA “latest”
(off/master) smirnoff99Frosst 0.2 smirnoff99Frosst-0.2-8 AKA “latest”
(off/master) Frosst_AlkEthOH 0.2 Frosst_AlkEthOH-0.2-0
(off/master) Frosst_AlkEthOH_ParmAtFrosst 0.2 Frosst_AlkEthOH_ParmAtFrosst-0.2-0
(smirnoff99Frosst/1.0.0) smirff99Frosst.ffxml 0.1 or “0.0”/pre-spec smirnoff99Frosst-0.{1,0}-0.offxml
(smirnoff99Frosst/1.0.X) smir{no}ff99Frosst.{o}ffxml 0.1 or “0.0”/pre-spec smirnoff99Frosst-0.1-X.offxml
(smirnoff99Frosst/1.0.8) smirnoff99Frosst.offxml 0.1 smirnoff99Frosst-0.1-8.offxml
j-wags commented 5 years ago

Tagging @jchodera @davidlmobley @leeping @andrrizzi @bannanc @slochower for their thoughts. I'm happy to iterate on this a bit.

jchodera commented 5 years ago

I'm a bit confused: You mention name-X.Y.Z, but your proposed new names are name-X.Y-Z.

j-wags commented 5 years ago

My mistake. I meant to put name-X.Y-Z. Will edit the original post. Thanks for catching that.

slochower commented 5 years ago

Given the above possibilities, I propose we use the versioning scheme name-X.Y-Z, where X and Y are the SMIRNOFF spec version (so 0.2 for this previous Monday's release)

Isn't this already introducing ambiguity because today's release is named 0.2.0 not 0.2, unless I'm misunderstanding something here. I think we want to minimize confusion, so if we tell people today's release is 0.2.0 and then say that the files will be named based on the SMIRNOFF specification and science release, then I would expect the exact string to be in the file name, not a truncated version 0.2. Right?

Edit: Whoops, I've already made a mistake that this naming should help to clarify. Right. I conflated the toolkit with the SMIRNOFF specification. The reason I did this is because I was -- implicitly -- considering the linkage between toolkit and specification version. For example, old toolkits probably won't be able to read a new SMIRNOFF specification, right? So shouldn't there also be a "this file is supported with toolkits > X.Y.Z"?

I was also thinking about this because I've been seeing all the tweets about the toolkit include code snippets referring to (unversioned) smirnoff99Frosst.offxml. Is the plan to distribute only smirnoff99Frosst-latest.offxml with the toolkit? Or no force field file at all?

j-wags commented 5 years ago

So shouldn't there also be a "this file is supported with toolkits > X.Y.Z"?

We have the mirror of this solution implemented, which is "this toolkit supports spec versions up to X."

We could consider how to make this information more prominent, since that might make it clearer to users.

j-wags commented 5 years ago

Is the plan to distribute only smirnoff99Frosst-latest.offxml with the toolkit? Or no force field file at all?

In the long run, we don't want to distribute "latest" forcefields with the toolkit. We want any offxmls that ship with the toolkit to be very clearly marked as "for testing purposes only". The current plan for this is making them only accessible via paths that acknowledge their "non production" status, likeForceField('test_forcefields/smirnoff99Frosst-0.2.8.offxml')

slochower commented 5 years ago

Okay, great! So just checking that the specification will only be formatted as X.Y and science can only go 0-9 unless you zero-pad Z (not in your examples above).

I mean you don't have to zero pad, but it'll make sorting work better :)

slochower commented 5 years ago

In the long run, we don't want to distribute "latest" forcefields with the toolkit. We want any offxmls that ship with the toolkit to be very clearly marked as "for testing purposes only". The current plan for this is making them only accessible via paths that acknowledge their "non production" status, likeForceField('test_forcefields/smirnoff99Frosst-0.2.8.offxml')

To be clear: the goal is to have users clone the smirnoff99Frosst repository somewhere, pulling in all force field versions/files, and then have them point ForceField() to the directory where they've cloned the smirnoff99Frosst repository?

(I feel like the tweets are already training users to use a force field distributed with the toolkit that they will just keep using as templates as they build up more and more complex examples. So when the path changes to test_forcefields, they'll just update their string. I could be wrong though.)

j-wags commented 5 years ago

So just checking that the specification will only be formatted as X.Y and science can only go 0-9 unless you zero-pad Z (not in your examples above).

Correct about X any Y. I could go either way on zero-padding Z. Right now I'd prefer not to, and just let sorting be a little difficult once we go past 9.

To be clear: the goal is to have users clone the smirnoff99Frosst repository somewhere, pulling in all force field versions/files, and then have them point ForceField() to the directory where they've cloned the smirnoff99Frosst repository?

Future plans are to makesmirnoff99Frosst conda-installable, and add some features to grant programmatic access to its data. Specifically, we can expose a data directory or entry point, which will let us pull forcefield files from the conda environment.

slochower commented 5 years ago

Future plans are to makesmirnoff99Frosst conda-installable, and add some features to grant programmatic access to its data. Specifically, we can expose a data directory or entry point, which will let us pull forcefield files from the conda environment.

Is the entry point going to be forcefields/?

j-wags commented 5 years ago

Is the entry point going to be forcefields/?

It could be. I was thinking that, to the user, the command would just look like ff=ForceField("name-X.Y-Z") (with no folder). The ForceField's parse_sources function would include logic to see if the desired file is available through the entrypoint.

davidlmobley commented 5 years ago

I don't have much input on the versioning other than that I agree with where this is going.

One side point though that this reminds me of:

Future plans are to makesmirnoff99Frosst conda-installable, and add some features to grant programmatic access to its data.

Semantic note: We'll also (as soon as we begin refitting) use a name other than smirnoff99Frosst since it'll be moving away from the parm99 roots. We'll still have versions of smirnoff99Frosst but they would be just bugfix/maintenance releases, not significant new improvements. Substantially improved force fields will be under a different name/repo (which would also be conda-installable, etc.).

slochower commented 5 years ago

I was thinking that, to the user, the command would just look like ff=ForceField("name-X.Y-Z") (with no folder).

So currently ForceField("smirnoff99Frosst.offxml") points to an included force field that's basically -latest and will be moved into test_forcefields/ but after you install the new smirnoff99Frosst with entry_point, ForceField("smirnoff99Frosst.offxml") will point outside the repository. So the behavior of the command will change in the near future?

davidlmobley commented 5 years ago

@slochower yes, in some sense of the word "behavior". From my perspective the command stays the same and it will just change how it searches.

j-wags commented 5 years ago

We may need to rethink how the "latest" forcefield will be found. Letting there be static copies of a file called "latest" may be a recipe for trouble (For example, what if someone last updated their repo two years ago? Then their "latest" forcefield would be out of date)

We could resolve this by having no unversioned files. That would makeForceField("smirnoff99Frosst.offxml") return a file-not-found error.

If we really want to be able to load a "latest" forcefield (and I'm not convinced that we do), we could have ForceField("smirnoff99Frosst-latest.offxml") ping some source on the internet to figure out what number "latest" corresponds to. Then it could instruct the user to conda update if they're behind.

Having said that, even this strategy comes with some problems. For example, if a user parameterized a system using latest a while ago, with an environment that may have been updated or changed in the meantime, then they won't know how to report the FF version in a publication.

jchodera commented 5 years ago

I don't think we want to include the SMIRNOFF spec version in the filenames. That's a technical detail---you can in principle have the SAME force field encoded in SMIRNOFF spec 0.1 or 0.2 and produce exactly the same energies. Or the same content could be encoded in 0.2 and 0.3 and not change (except for the spec number) because the force field does not use new features in SMIRNOFF spec 0.3.

What matters is the science versioning. I think we should just name files "smirnoff99Frosst-X.Y.Z" where

The encoding spec will be defined inside the file. You'll get an Exception if you try to use software that is too old to read it, and that is easily remedied by using a more recent version of the toolkit. There is no chance for accidental human cognitive error, so there is no need to try to avoid human cognitive errors by encoding the spec version in the filename. We're just trying to avoid the human cognitive errors of

davidlmobley commented 5 years ago

My original perspective was that we should just always have force field version numbers in the file name. I think I still agree with this; having latest just seems too dangerous. I agree with the fact that it could therefore be painful to update code to use the latest version, but this seems preferable to accidentally using an unintended version or accidentally changing version mid-project.

I also agree with John that we probably don't need the spec version encoded in the path name.

andrrizzi commented 5 years ago

A clarification, I think that old/smirnoff99Frosst_0_0_4_fixed.offxml doesn't have the same "scientific version" of the current smirnoff99Frosst.offxml. See for example parameters b85 and a14.

j-wags commented 5 years ago

Sorry for the slow reply, I'm trying to make sure I'm thinking through this fully.

you can in principle have the SAME force field encoded in SMIRNOFF spec 0.1 or 0.2 and produce exactly the same energies.

I'm trying to reconcile this with point 2) in my original post on this issue. We do have two versions of smirnoff99Frosst version "9". The 0.1 spec version does not specify PME electrostatics. The 0.2 spec version does specify PME electrostatics. So they're derived from a common ancestor, and they're matching 1:1 on bugfixes, so we could say they're "kind of the same". However, one specifies a long-range electrostatics method, and the other one doesn't, so it's possible that they generate systems where energy does not match. Based on this, I'd think that the versioning scheme we settle on should assign them different filenames.

I could see two solutions to this.

One I'll call "serial" versioning, where we say "everything in the smirnoff99Frosst repo was version 1-9 of our forcefield. Everything in the current OFF toolkit repo is compatible with spec 0.2, and is therefore version 10+. If we go the "serial" route, we'd do it using the X.Y.Z versioning John suggested.

The other I'll call "parallel" versioning. This is where the forcefield version would include some sort of reference to its SMIRNOFF spec version, similar to my original suggestion. Under this scheme, everything from the smirnoff99Frosst repo would become 0.1.0 - 0.1.8. Versions of smirnoff99Frosst in the toolkit repo would become 0.2.0 - 0.2.8 (probably with gaps, since we don't match 1:1)

========

We could also decide that reverse-compatibility of our versioning scheme isn't essential (in fact, "renaming" the smirnoff99Frosst FF's will be some trouble, and we can decide how to handle that separately)

However, I think we will have parallel versions of one FF in two different spec versions again. For example, it's likely that we'd have both versions around for testing during a spec increment.

bannanc commented 5 years ago

Sorry for the slow reply, I didn't see the tag until this morning. Jeff and I have been talking about this a lot so I thought I would just add how I've been thinking about it.

Something I've been concerned about is that we specify which version of the SMIRNOFF spect to use inside the smirnoff99frosst file so if we have smirnoff99frosst in parameter version 1.0.8 there could be two copies of this file one that tells the toolkit to use spec version 0.1.0 and one that tells the toolkit to use 0.2.0. Is it possible that having a different spec inside the 1.0.8 parameter file would affect my scientific results?

As we have new versions of the spec are they always going to be backwards compatible? so I only have SMIRNOFF99Frosst-1.0.8 will I be able to use that file in any future version of the openforcefield toolkit?

I imagine we could have force field file names with the form [name]-[FF version]-[spec version] so smirnoff99frosst-1.0.8-0.2.0.offxml so that you know which one you're getting in terms of the spec and force field.

@jchodera This part doesn't really make sense to me:

What matters is the science versioning. I think we should just name files "smirnoff99Frosst-X.Y.Z" where

  • X denotes we've made some major change in which functional form
  • Y is the parameterization epoch / generation
  • Z is a bugfix version---e.g. something we've caught and corrected via the force field strike team

As soon as we move away from parm99/parm@frosst centered parameters we plan to start a new force field name. Do you foresee a future where the same forcefield name has different versions with different functional forms?

jchodera commented 5 years ago

My original perspective was that we should just always have force field version numbers in the file name. I think I still agree with this; having latest just seems too dangerous. I agree with the fact that it could therefore be painful to update code to use the latest version, but this seems preferable to accidentally using an unintended version or accidentally changing version mid-project.

We discussed several possible use cases before. If you're doing reproducible science, you want to specify a specific (science) version of the force field. But, let's face it, most of our partners are going to be using this force field to perform quick calculations in service of a project they are supporting, and only care about using the latest, most accurate force field. If we tell them "every time we update our force field to make it better, you have to rework all your code to use new version numbers", they will hate us. The latest science version serves a legitimate and critical purpose for our major user community.

I imagine we could have force field file names with the form [name]-[FF version]-[spec version] so smirnoff99frosst-1.0.8-0.2.0.offxml so that you know which one you're getting in terms of the spec and force field.

I think this makes the problem even more confusing, and works in the direction opposite of avoiding cognitive errors.

We want to avoid errors in which the wrong science version is used if you want reproducible science results, hence we want to include the science version in the file name. The software can't guard against this because it doesn't know what the user intends, so we have to take an explicit step to make sure the user is aware of an explicit science version choice they are making.

If we want to also avoid errors in which an incompatible spec version is used, we don't need to do anything since the software will raise an Exception in this case. The user then simply has to upgrade to the appropriate software version for the file to work.

We do not want to include both version numbers in the filename because that would attempt to impose the same solution on two distinct problems, creating yet a new problem about "which version number is which thing"?

jchodera commented 5 years ago

Regarding @j-wags point about compatibility of the SMIRNOFF spec versions:

For me, the major question is really whether we have global or local SMIRNOFF spec versions. OpenMM's serialized System.xml, for example, has per-Force object versions, allowing the format for each individual handler to evolve independently. This might make everything more modular, since each parameter handler could then have versions that evolve independently.

slochower commented 5 years ago

Is it possible that having a different spec inside the 1.0.8 parameter file would affect my scientific results?

I think the answer to this is yes, due to undefined behavior. For example, you could simulate the 1.0.8 parameter set with direct electrostatics handled using a cutoff, using specification 0.1, and get answers that are different from using the same parameter set 1.0.8 with PME, as specified in version 0.2. So in principle, I think this can happen. I'm not sure it is a big deal, but I think it can happen.

We want to avoid errors in which the wrong science version is used if you want reproducible science results, hence we want to include the science version in the file name.

I agree with this; we definitely want to include the science version in the filename. But, I think, given that the science version is not sufficient to understand how the results are calculated (like above, where PME may or may not have been used), I think it's also useful to have the specification in the filename. While this is unwieldy, I think it will help understanding code snippets and reduce the cognitive load, to my eyes. If I see ForceField("smirnoff99Frosst-0.2-8.offxml") in a notebook, I know more about the system than if I see ForceField("smirnoff99Frosst-8.offxml"). Right?

For example, if we add a new tag to SMIRNOFF 0.3, future toolkit versions can read both SMIRNOFF 0.2 and 0.3.

We probably all agree this is a good idea. My question: is the science versioning going to be tied to SMIRNOFF versioning, for example, could force field version 2.0.0 only be compatible with specification 0.3, because force field version 2.0.0 uses features that are undefined in specifications before 0.3? In other words, does there need to be a "don't use this force field before SMIRNOFF version X.Y" or "only use this force field with specification X.Y" string encoded somewhere (distinct from which SMIRNOFF versions the toolkit understands)?

For me, the major question is really whether we have global or local SMIRNOFF spec versions. OpenMM's serialized System.xml, for example, has per-Force object versions, allowing the format for each individual handler to evolve independently.

I don't know much about how OpenMM does it, but I think that could be fine. Although, then, I think it will be difficult to map between filenames (if we only include the science version in the file name) and simulation behavior unless there is a 1:1 mapping between the science version and the specification version.

jchodera commented 5 years ago

I think the answer to this is yes, due to undefined behavior. For example, you could simulate the 1.0.8 parameter set with direct electrostatics handled using a cutoff, using specification 0.1, and get answers that are different from using the same parameter set 1.0.8 with PME, as specified in version 0.2. So in principle, I think this can happen. I'm not sure it is a big deal, but I think it can happen.

I don't think we should consider this possibility. Any time the spec changes the behavior of an existing parameter block, we will not be backwards compatible, so the code should throw an exception if it cannot support that version.

However, we can certainly build scientifically identical versions of the 1.0.8 parameter set for SMIRNOFF 0.2 and 0.1.

SMIRNOFF 0.1 was our first attempt and more of a proof of concept. Now that we have codified the idea that the SMIRNOFF 0.2 and onward formats encode everything needed to compute the energy unambiguously, the concerns about which SMIRNOFF spec you are using should be purely technical going forward.

jchodera commented 5 years ago

My question: is the science versioning going to be tied to SMIRNOFF versioning, for example, could force field version 2.0.0 only be compatible with specification 0.3, because force field version 2.0.0 uses features that are undefined in specifications before 0.3? In other words, does there need to be a "don't use this force field before SMIRNOFF version X.Y" or "only use this force field with specification X.Y" string encoded somewhere (distinct from which SMIRNOFF versions the toolkit understands)?

In my proposal in https://github.com/openforcefield/openforcefield/issues/249#issuecomment-481463436, incrementing the major version of the science versioning could indicate that we need to use some new functional form in the SMIRNOFF spec, but we don't necessarily need to tie the numbers together between science and SMIRNOFF. We might even decide that dates are more useful for the force field releases, rather than version numbers.

davidlmobley commented 5 years ago

I agree with John here. Conceivably we could have the same version of the FF (science) available for different toolkit versions, and we don't necessarily need the filename to indicate that, as long as older toolkit versions will refuse to run FFs using a newer spec.

j-wags commented 5 years ago

Conceivably we could have the same version of the FF (science) available for different toolkit versions, and we don't necessarily need the filename to indicate that

On a practical level, I think we do want filename to indicate that. Generally, I want to avoid situations where we have identical filenames with different contents. If we permitted different files to have the same filename, that would constrain the structure of our central forcefield repo, since we couldn't put "all the forcefields" in a single folder (due to naming collisions). We could resolve this by having each forcefield live in a folder according to their SMIRNOFF spec, at which point they'd be accessed by spec-0.2/smirnoff99Frosst-1.0.8.offxml. But that basically puts the spec in the filename (or, path) anyway.

j-wags commented 5 years ago

Some times, we will have to break the spec, as we did between SMIRNOFF 0.1 and 0.2. That's fine, it will happen, especially early on.

This is the main area where I'm concerned. As we add new tags to the spec, each one comes with the chance of introducing ambiguity that is removed in a later spec update.

If we couple SMIRNOFF version to the filename, any time we catch a mistake, we can heal the provenance of work done by making the statement "smirnoff99Frosst 0.1.8 and 0.2.8 were not actually the same forcefield." and generating a DOI for it.

However, if it's not clear which spec version a forcefield may have contained, then clearing up the same issue becomes really difficult.

jchodera commented 5 years ago

I see your point about accidentally leaving ambiguity that is resolved later (though this is presumably something we want to resolve within the spec version since that would be a bug, not a feature!). But picturing the evolution of the spec over a few iterations here---it seems like it will be incredibly exhausting to try to keep a spec version in the filenames. How do we update them? Do we upgrade them all every time we release a spec bugfix update? Or a minor version update to the spec? I can't even wrap my head around how the migration process might work.

j-wags commented 5 years ago

it seems like it will be incredibly exhausting to try to keep a spec version in the filenames.

That's a good point. This would add a lot of friction.

How about we go forward with @jchodera 's idea of X.Y.Z, but only release the "official" version of each forcefield coupled to one SMIRNOFF spec version? I think that would address all of my concerns.

For example, forcefield 1.2.3 would "officially" use spec 1.7. That way, if there's a problem with the spec, we release forcefield 1.2.4 which uses spec 1.8.

For me, the major question is really whether we have global or local SMIRNOFF spec versions. OpenMM's serialized System.xml, for example, has per-Force object versions, allowing the format for each individual handler to evolve independently.

I'd also be open to per-parameter-block spec versions. In fact, that might be nice, as we might expand in a few years to support dozens of tags, and it would be cumbersome to keep updating the entire spec version due to updates/bugfixes in some rarely-used tag.

davidlmobley commented 5 years ago

@j-wags that sounds like a good plan to me.

jchodera commented 5 years ago

This sounds like a good idea to me.

And I do like the idea of per-parameter-block spec versions. This has worked well for OpenMM, so I think the idea is sound.

j-wags commented 5 years ago

Ok, so here's my proposal for how to transition to this versioning scheme and a OFF FF repo as part of the 0.3.0 release. I'd appreciate any feedback, or 👍 if this plan looks like it's ready to implement.

Force fields moving forward will be called name-X.Y.Z

X denotes we've made some major change in which functional form Y is the parameterization epoch / generation Z is a bugfix version---e.g. something we've caught and corrected via the force field strike team (Basically John's suggestion above) 

Force field file naming

Migration to-do list (for 0.3.0 release)

File migration table (?s must be mapped to S99F releases)

Current filename "([repo]/[branch or tag]) [filename]" Proposed filename (in openforcefield-forcefields) Whole-file SMIRNOFF SMIRNOFF section version
(off/master) smirnoff99Frosst_0_0_2 smirnoff99Frosst-1.1.?.offxml 0.2 N/A
(off/master) smirnoff99Frosst_0_0_4 smirnoff99Frosst-1.1.?.offxml 0.2 N/A
(off/master) smirnoff99Frosst_0_0_4-fixed smirnoff99Frosst-1.1.?.offxml 0.2 N/A
(off/master) data/forcefield/smirnoff99Frosst smirnoff99Frosst-1.1.8.offxml 0.2 N/A
latest in 0.2 spec w/ H mass fix smirnoff99Frosst-1.1.9.offxml 0.2 N/A
latest in 0.3 spec w/ H mass fix smirnoff99Frosst-1.2.9.offxml N/A 0.2
(off/master) utilities/convert_frosst/smirnoff99Frosst smirnoff99Frosst-1.0.?.offxml 0.2 N/A
(off/master) Frosst_AlkEthOH Frosst_AlkEthOH-1.0.0.offxml 0.2 N/A
(off/master) Frosst_AlkEthOH_ParmAtFrosst Frosst_AlkEthOH_ParmAtFrosst-1.0.0.offxml 0.2 N/A
(smirnoff99Frosst/1.0.0) smirff99Frosst.ffxml smirnoff99Frosst-1.0.0.offxml 0.1 or “0.0”/pre-spec N/A
(smirnoff99Frosst/1.0.X) smirff99Frosst.ffxml smirnoff99Frosst-1.0.X.offxml 0.1 or “0.0”/pre-spec N/A
(smirnoff99Frosst/1.0.8) smirnoff99Frosst.offxml smirnoff99Frosst-1.0.8.offxml 0.1 N/A
(smirnoff99Frosst/1.0.9) smirnoff99Frosst.offxml (H mass fix) smirnoff99Frosst-1.0.9.offxml 0.1 N/A
andrrizzi commented 5 years ago

Seems like a great plan to me! A couple of comments

SMIRNOFF data sources will no longer have a "global" SMIRNOFF spec version at the top of the file.

We might need to keep this if we want to keep the door open to changes outside the parameter tags (e.g. add new options besides aromaticity_model or new metadata tags like <Date>).

should -latest be hard copies, or links? Should -latest really be a local file, or should it always do some sort of internet query to determine the "true" latest version?

I like the link idea. We may run into problems if the file is stored only remotely in systems that don't have access to external networks. Also, if this file is the file of a repo, people may accidentally parametrize the system with a temporarily "bugged" force field that will need corrections before being released.

jchodera commented 5 years ago

Since our first series of improved force field objectives is to simply improve the parameters associated with smirnoff99Frosst 1.0.z, I have no problems calling these force fields smirnoff99Frosst-x.y.z for the foreseeable future.

Once we adopt a different strategy for coming up with SMARTS types, I think we can start to migrate away from the smirnoff99Frosst name.

For example, in future, we could have families of force fields that share a prefix, such as openforcefield-fixedcharge-1.1.0 would match openforcefield-polarizable-1.1.0 in that they derived from the same data but featured different electrostatics models.

davidlmobley commented 5 years ago

@jchodera @j-wags - relating to smirnoff99Frosst naming, Chris's perspective (which I agree with) is that we ought to move away from the 99Frosst naming as soon as we begin any significant refitting; I tend to agree with him. This was a nod to the "parm@Frosst" heritage but we've already diverged from that significantly and once we refit to any significant extent, the parm@Frosst heritage will essentially be gone. Also the 99 in the name will read like "1999" to people familiar with AMBER-family force fields which may be bad branding since that's 20 years ago. I don't have a great alternate name though, but perhaps we should throw that open on Slack for the acronym-savvy to have at it. :)

@j-wags I agree with most of your proposal, though on this:

Should -latest really be a local file, or should it always do some sort of internet query to determine the "true" latest version?)

I still think having a file that has latest in its name is a terribly bad idea since people will assume it is latest even when it has gone out of date. I could see this triggering an internet look-up perhaps, or a link is also acceptable. I can see how having something like this would help with maintaining the examples, but we also don't want to have to devote much time to supporting it.

It also seems like you have too many force fields in your migration table, or perhaps the latest changes have essentially created too many files. Currently my view of what we OUGHT to have is:

We should actually be able to remove a number of the various AlkEthOH variants except those utilized in tests/examples. I had some additional ones (e.g. Frosst_AlkEthOH_ParmAtFrosst I believe) there for testing/paper-writing purposes but these have no functionality utility anymore unless they are utilized in the tests.

Otherwise I think this looks good.

jchodera commented 5 years ago

Instead of a latest file, we can have the force field repo install an importable variable name or function that contains the path to the latest force field version. That would explicitly avoid having a file named latest.

j-wags commented 5 years ago

SMIRNOFF data sources will no longer have a "global" SMIRNOFF spec version at the top of the file.

We might need to keep this if we want to keep the door open to changes outside the parameter tags (e.g. add new options besides aromaticity_model or new metadata tags like ).

Good point, @andrrizzi. After a bit of thought, I think you're totally right. I'll keep a top-level SMIRNOFF version in these files as well.

j-wags commented 5 years ago

@davidlmobley

It also seems like you have too many force fields in your migration table, or perhaps the latest changes have essentially created too many files. Currently my view of what we OUGHT to have is:

  • All releases in smirnof99Frosst repo, which SHOULD (but perhaps aren't??) be using latest spec, or should be updated to use it

The files in the openforcefield/smirnoff99Frosst repo aren't using the current spec. The newest one there uses the 0.1 spec, and some of the older ones use something else. We can update them, but then they will encode different science (since they'll have specified an electrostatics method). This is why I think we need to have the separate smirnoff99Frosst-1.0.X and smirnoff99Frosst-1.1.X series of OFFXMLs.

  • Source files for these in the off repo under utilities/convert_frosst. These should be the same!! If we catch a human error in a SMIRKS pattern or similar, this is still where we would make an update in the source files there and then re-generate the ffxml files from it, so we want to ensure these are current until we actually have done some refitting.

Ah, I didn't previously understand this process. convert_frosst/convert_frcmod.py has indeed fallen behind, and does not generate 0.2 spec files. I've made issue #293 for this and assigned it to the 0.3.1 milestone so it will stop falling through the cracks.

Since we'll have all FF versions present in the openforcefield/openforcefield-forcefields, would we have a frcmod source file for each smirnoff99Frosst-X.Y.Z file? This would make sense, up until we have a spec version change, at which point the conversion script needed to create the OFFXML would change, and that should somehow be recorded. Maybe that kind of detail could be in the OFFXML comments, or mentioned in the release notes when the new FF comes out.

  • Various AlkEthOH variants

We should actually be able to remove a number of the various AlkEthOH variants except those utilized in tests/examples. I had some additional ones (e.g. Frosst_AlkEthOH_ParmAtFrosst I believe) there for testing/paper-writing purposes but these have no functionality utility anymore unless they are utilized in the tests.

Agreed

davidlmobley commented 5 years ago

@j-wags :

The files in the openforcefield/smirnoff99Frosst repo aren't using the current spec. The newest one there uses the 0.1 spec, and some of the older ones use something else. We can update them, but then they will encode different science (since they'll have specified an electrostatics method). This is why I think we need to have the separate smirnoff99Frosst-1.0.X and smirnoff99Frosst-1.1.X series of OFFXMLs.

I don't understand, what's the difference between 1.0.x an 1.1.x at this point? And why would those force fields in the smirnoff99Frosst repo have a specified electrostatics method and others won't?

Since we'll have all FF versions present in the openforcefield/openforcefield-forcefields, would we have a frcmod source file for each smirnoff99Frosst-X.Y.Z file? This would make sense, up until we have a spec version change, at which point the conversion script needed to create the OFFXML would change, and that should somehow be recorded. Maybe that kind of detail could be in the OFFXML comments, or mentioned in the release notes when the new FF comes out.

I think as soon as we actually refit any parameters, we should stop using the frcmod source file as a source file.

At this point, the ONLY changes between versions have been to the spec (at least, that's how I summarize them) or manual fixes to specific parameters in the frcmod file, so it's made sense to me to continue doing things this way.

I'm open to suggestions though.

j-wags commented 5 years ago

what's the difference between 1.0.x and 1.1.x at this point?

For example, here's the file I will rename to smirnoff99Frosst-1.1.8.offxml from the openforcefield/openforcefield toolkit repo's 0.2.2 release. In that file, the long range electrostatics method is defined here.

Conversely, here's the file I will rename to smirnoff99Frosst-1.0.8.offxml, from the openforcefield/smirnoff99Frosst repo's 1.0.8 release. At the time this was created, the long range electrostatics method wasn't part of the spec.

davidlmobley commented 5 years ago

I agree that's functionally somewhat important, especially once people start using that part of the spec. But... it doesn't actually involve differences in parameters or any functional differences at this point.

j-wags commented 5 years ago

My general rule here for when force field files should be considered "different" is "could it produce a different energy?" In this case, someone could have taken the file I'm proposing to version as 1.0.8, and done a periodic simulation with an electrostatics cutoff, in which case they would get different energies than if they took 1.1.8 and did a periodic simulation, which would have automatically set them to use PME.

mattwthompson commented 1 year ago

Is this resolved, or still under consideration? If it is, I will move this into a Discussion. If not, I will close it.

I have to ask since it'd take me about 20-30 minutes to read and understand everything and I'm not sure what has already been implemented. If it's unclear I would simply advocate for closing without investing that effort.

davidlmobley commented 1 year ago

Let's close and then reopen if we need it later.