Open peastman opened 4 years ago
I can work with @joaander to include specifications and design docs for GSD (most commonly used with the HOOMD-blue simulation engine). The GSD docs already have much of the requested info, so we may cross-reference that and fill in whatever information is missing: https://gsd.readthedocs.io
I think that @peastman's suggestion is a great start, since it will give us authoritative references to resolve conflicts/uncertainty.
I think we'll ultimately want to distill the information from the spec docs down into a table, or set of keywords, describing the "information content" of each file. A quick example might be:
Format | Elements | Atom types | Coordinates | Bond orders | Harmonic bond parameters | Nonharmonic bond parameters | Atom formal charges | Atom partial charges |
---|---|---|---|---|---|---|---|---|
SMILES | Y | N | N | Y | N | N | Y | N |
AMBER Prmtop | N | Y | N | N | Y | N | ||
OpenMM XML | N | Y | Y | N | Y | Y | N | Y |
SYBYL/Corina mol2 | N | Y | Y | Y | N | N | N | Y |
TRIPOS mol2 | Y | N | Y | Y | N | N | ? | Y |
... |
(the above is probably incorrect, I just quickly jotted some names and categories down)
So, I'd propose that each format could be defined by a set
of keywords that it must or may contain. As we look to include more formats/details, it's likely that we'll find that our current keywords aren't fully descriptive, and we'll want add, split, or merge some. So, we could consider each set of keywords to be one version of a specification, and have rules for automatically updating from an old to a new specification (which may include completely automated transformation, or identifying cases requiring human review).
Good idea. We also might group them into a few broad categories:
Of course some formats can store more than one of those. A PDB includes both chemical information and conformations, and it can be used to store trajectories (although it's not a very good format for that).
Let's come up with a list of formats we want to document. Here's a start.
Standard (not application specific) structure formats:
PDB PDBx/mmCIF MOL2 SDF
Trajectory formats:
DCD XTC NetCDF TRR BINPOS DTR XYZ
Application specific MD input and output formats:
CHARMM/NAMD Amber Gromacs OpenMM LAMMPS HOOMD-blue Desmond
Formats used by QC codes:
Not my field! Can someone else fill this in?
Here's an initial proposal on how we can organize this repository.
As an example, OpenMM's forcefield format is described in the manual. So I'll create an "OpenMM" directory containing a PDF of the most recent manual. The accompanying README will give the URL it was downloaded from and reference the sections that describe the format.
Thoughts?