mosdef-hub / forcefield_template

A template repo for disseminating force fields with foyer
https://github.com/mosdef-hub/foyer
6 stars 9 forks source link

documentation requirements #8

Open chrisiacovella opened 6 years ago

chrisiacovella commented 6 years ago

To help ensure that people are providing all the information required with a forcefield (e.g., Original source, who created it, etc.) it might be worth it to create what is basically just an XML version of the Readme, so we can more easily validate content. It would also be useful for creating a database of forcefields.

I imagine we could write a quick parsing tool that automatically generates the readme file (AND provides warnings if required information is not defined) so you only have to write this stuff once.

Things that seem like they should be required:

chrisiacovella commented 6 years ago

It might be good to have the section on the test suite automatically generated when running the atomtyping.py test (the script to create the readme could run the test suite too). E.g., it would update which molecules are in the tests and if the were atom-typed correctly.

chrisiacovella commented 6 years ago

A proposed scheme:

<?xml version= "1.0"?>
<Foyer title="" website="" family="">
  <Creator last="" first=""/>
  <Source doi="" desc="" primary="" year="">
    <Author last="" first=""/>
    <Journal name="" volume="" number="" pages="" year="" title="" doi="" />
    <Note> </Note>
  </Source>
  <AdditionalNote>
  </AdditionalNote>
  <TestSuite>
    <Molecule name="" status=""/>
  </TestSuite>
</Foyer>

Tag by tag why I did what I did:

<Foyer title="" website="" family="">
  <Creator last="" first""/>
  <Source doi="" desc="" primary="" year="">
    <Author last="" first=""/>
    <Journal name="" volume="" number="" pages="" year="" title="" doi="" />
    <Note> </Note>
  </Source>
<AdditionalNote>
  </AdditionalNote>
  <TestSuite>
    <Molecule name="" status="">
  </TestSuite>

Presumably this could be generated automatically, but it would at least list what molecules were defined as tests.

I put together an example for the PFA forcefield. I think this will help us keep better track of not just source, but the relevance of those sources, and make it easier to parse this information. Again, we can use this xml file to generate the README.

<?xml version= "1.0"?>
<Foyer title="OPLS-AA parameters for perfluoroalkanes in Foyer format" website="https://github.com/chrisiacovella/oplsaa_perfluoroalkanes" family="OPLS-AA">
  <Creator last="Iacovella" first="C.R."/>

  <Source doi="10.1021/jp004071w" desc="All-atom OPLS parameters for perfluoralkanes" primary="True" year="2001">
    <Author last="Watkins" first="Edward K"/>
    <Author last="Jorgensen" first="William L"/>
     <Journal name="Journal of Computational Chemistry" volume="105" number="16" pages="4118--4125" year="2001" title="Perfluoroalkanes: Conformational analysis and liquid-state properties from ab initio and Monte Carlo calculations" doi="10.1021/jp004071w" />
    <Note>The forcefield here describes the general parameters for perfluoroalkanes; specific dihedrals exist for 4 and 5-mers in the original manuscript </Note>
  </Source>

  <Source doi="10.1002/jcc.540130806" desc="CT-F Bond Source" primary="" year="1992">
    <Author last="Gough" first="Craig A"/>
    <Author last="Debolt" first="Stephen E"/>
    <Author last="Kollman" first="Peter A"/>
    <Journal name="Journal of Computational Chemistry" volume="13" number="8" pages="963--970" year="1992" title="Derivation of fluorine and hydrogen atom parameters using liquid simulations" doi="10.1002/jcc.540130806" />
    <Note> CT-F bonds are taken from parameters in this manuscript, as described in Watkins and Jorgensen. </Note>
  </Source>

  <Source doi="10.1021/ja00124a002" desc="CT-F Bonds" primary="" year="1995">
    <Author last="Cornell" first="Wendy D"/>
    <Author last="Cieplak" first="Piotr"/>
    <Author last="Bayly" first="Christopher I"/>
    <Author last="Gould" first="Ian R"/>
    <Author last="Merz" first="Kenneth M"/>
    <Author last="Ferguson" first="David M"/>
    <Author last="Spellmeyer" first="David C"/>
    <Author last="Fox" first="Thomas"/>
    <Author last="Caldwell" first="James W"/>
    <Author last="Kollman" first="Peter A"/>
    <Journal name="Journal of the American Chemical Society" volume="117" number="19" pages="5179--5197" year="1995" title="A second generation force field for the simulation of proteins, nucleic acids, and organic molecules" doi="10.1021/ja00124a002" />
    <Note> F-CT-F angles come from this manuscript, as described in Watkins and Jorgensen. </Note>
    <Note> CT-CT-F angles are the same as CT-CT-OH and CT-CT-OS list in this manuscript, as described in Watkins and Jorgensen. </Note>
  </Source>

  <Source doi="10.1021/ja9621760" desc="All-atom OPLS parameters for alkanes" primary="" year="1996">
    <Author last="Jorgensen" first="William L"/>
    <Author last="Maxwell" first="David S"/>
    <Author last="Torado-Rives" first="Julian"/>
    <Journal name="Journal of the American Chemical Society" volume="118" number="45" pages="11225--11236" year="1996" title="Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids" doi="10.1021/ja9621760" />
    <Note> Bonds and angles for the CT-CT and CT-CT-CT are taken from this manuscript for alkanes, as described in Watkins and Jorgensen. </Note>
  </Source>

  <AdditionalNote> The backbone dihedral specifically references opls_962 (i.e. C-CF2-C) rather than only using the "CT" class; if only the "CT" class were used, this would create a conflict with alkane systems if the parameters were merged. </AdditionalNote>
  <AdditionalNote> The original parameters are defined as kcal/mol, this file uses kJ/mol; a conversion factor of 4.184 was used, consistent with OpenMM. </AdditionalNote>
  <AdditionalNote> PI is defined as 3.141592653589 for conversion to radians, consistent with OpenMM.</AdditionalNote>
  <AdditionalNote> Atom type names, e.g., opls_961, correspond to those defined in the OPLS forcefield itp file distributed with GROMACS. </AdditionalNote>
  <AdditionalNote> Conversion from OPLS-style dihedrals to RB follow the formulas detailed in the GROMACS manual. </AdditionalNote>
  <TestSuite>
    <Molecule name="CF4.mol2" status="PASS"/>
    <Molecule name="perfluoro-2-methylbutane.mol2" status="PASS"/>
    <Molecule name="perfluorohexane.mol2" status="PASS"/>
  </TestSuite>
</Foyer>
chrisiacovella commented 6 years ago

In offline discussions, I think I'm going to try creating a minimal documentation file (e.g., that only requires doi and notes, rather than full references). The parsing code will automatically gather this info and write out both the Readme.md file and a "full" xml file.

We should also write out a bibtex file with the references included in the xml file.

ctk3b commented 6 years ago

Yeah I think DOI's are totally sufficient here. Also worth checking out what the OpenForceField group is doing here: https://github.com/open-forcefield-group/openforcefield/blob/master/The-SMIRNOFF-force-field-format.md

They have some of the above features and I think it's worth trying to diverge as little as possible while we're still making design decisions.

chrisiacovella commented 6 years ago

Well I guess I don't want the final readme or xml file to only have the DOIs; I can quickly see an author and a year and know what paper it is, but I'd have to do more work to actually lookup the DOI. Glancing at the specs in the Readme for that other file, we might have some additional stuff automatically written to a readme:

While we could certainly get DOIs directly from the forcefield xml file, I think a separate xml document would be good since I think it is essential we add some notes associated with each paper, considering most forcefield parameter sets have been derived/aggregated in not so standard ways. Also it allows a clear explanation as to which parameters were chosen when there are duplicates (e.g., when merging two force field files).

In any case, working on updating the parsing code to automatically grab info from a doi, and then populate the relevant fields.