Simplify reading and writing of canonical dihedrals

ctk3b commented 9 years ago

@jpthompson17 mentioned a neat idea that should simplify the dihedral logic.

Every dihedral (and perhaps other forces later down the road) should know how to convert itself to the canonical form. This could easily be achieved by adding a method to those dihedrals which calls the appropriate conversion. That method would then be invoked under the hood whenever a dihedral of any kind is added to the system without the need to write conversion code within a particular engine's parser.

One caveat is that there are still engine specific transformations that need to take place such as how LAMMPS coefficients differ by a factor of two and dihedral angles are shifted by 180 relative to InterMol's internals. These could also be methods of the actual force or just be something handled by each parser.

jpthompson17 commented 9 years ago

Also, looking at the form of the TrigDihedral, would it make sense to get rid of the phase shift attribute phi, and just have TrigDihedral be a simple cosine series, V = sum(f_n * cos(nφ)) for n=0...N? As you mention in the comments of convert_dihedrals, phi = 180 just corresponds to flipping signs on the coefficients f_n for n > 0, so the problem of how to represent a given dihedral potential (e.g. RB or multiple periodic-type terms) as a DihedralTrig is underdetermined. Removing phi would resolve this, lead to a more compact data type, and perhaps slightly simplify conversions between functional forms.

ctk3b commented 9 years ago

I believe the reason for this being a parameter was for the case where phi is neither 0 nor 180. I don't believe we have any examples of this yet but @mrshirts may know more about this. Was there a Desmond example with phi=60?

jpthompson17 commented 9 years ago

I took a stab at what this might look like in rb_dihedral_type.py. (Also added some comments.) It's just a sketch and completely untested--hence no pull request. I just wanted to see if it's along the lines of what you're thinking.

ctk3b commented 9 years ago

Yup! Exactly like that. The immediate things I can think of are:

Whenever a dihedral is added to the system, call this method on itself. Logic should probably be in moleculetype.
A way to add other engine specific modifications. So far, we've only identified the LAMMPS scaling factor of 2 which is fairly trivial to add, probably as an argument to this method? Maybe it could recognize the SCALE_FROM and SCALE_INTO class specific constants from which ever parser calls it?

jpthompson17 commented 9 years ago

Some thoughts: Since the internal system only needs to store instances of "canonical" bonds, angles, dihedrals, and impropers, classes like RbDihedral[Type] end up being just "helper" classes for the parsers to get parameters into/out of the canonical (or internal) representations.

One way to approach designing these helper classes is to add extra attributes to accomodate slightly different functional forms/parameter lists used by different engines, and leave some attributes as default (None or 1 or 0, depending on the situation). The parsers would specify which engine-specific styles define which attributes (and in what order), so that engine-specific settings currently in forcedata.py could be taken out of the common intermol.forces submodule and put in engine-specific submodules. The engine-specific code would also set any other attributes that are needed to account for slight differences in functional forms (see discussion of _prefactor attribute, below). Then the canonical() function (or whatever it ends up being called) would handle the logic of converting to the more compact internal representation, depending on which attributes are defined (and maybe also generate warnings or errors when defined attributes are redundant or inconsistent--e.g. see sign and phi, below). The compact internal/canonical representation would be free of excess attributes needed for automated parsing (by forcefunctions + a helper class) of various engine-specific parameter lists.

As an example, consider the GROMACS periodic dihedral (func. type 1), LAMMPS charmm and harmonic dihedral styles, and LAMMPS cvff improper style. The relevant helper class is ProperPeriodicDihedralType. All of the styles' parameter lists have a force constant and a periodicity, so we obviously need k and multiplicity. The LAMMPS charmm style has a 1-4 weight (which should be checked against special_bonds and maybe used to set the system.lj_correction and system.coulomb_correction), so we'll keep the weight attribute (although this will be ignored when we convert to an internal-format dihedral). The LAMMPS charmm and GROMACS periodic functions have a phase shift angle, so we'll keep phi (though the name phase might be clearer since φ usually denotes the dihedral angle itself). LAMMPS harmonic dihedrals and cvff impropers use a sign coefficient 'd' on the cosine to represent the phase shift, so we could add the attribute sign. Finally, LAMMPS cvff impropers are identical to LAMMPS harmonic dihedrals, but some people like to keep track of which dihedrals are proper vs. improper so we keep the improper attribute. See my untested modifications in proper_periodic_dihedral_type.py.

Back to the issue of the factor of 1/2 in harmonic bond/angle force constants. Here, the helper classes (HarmonicBondType and HarmonicAngleType, which in this case also comprise fundamental/canonical representations) could define a "hidden" attribute _prefactor (with, say, a default value of 1) that can be set by the parser to scale self.k. See my (again, untested) modifications and comments in harmonic_bond_type.py. In this case, the GROMACS parser would need to specify a prefactor of 1/2 on initialization or by calling bond.prefactor = 0.5.

mrshirts commented 9 years ago

Hi, all-

Yes, it's a parameter because desmond supports the parameter being something other than 180 and 0.

In general, the hard thing about this is we have to come up with a 'canonical' form that is the most general way to store equivalent forces. In this case, it's the desmond form.

On Wed, Jan 28, 2015 at 4:19 PM, Christoph Klein notifications@github.com wrote:

I believe the reason for this being a parameter was for the case where phi is neither 0 nor 180. I don't believe we have any examples of this yet but @mrshirts https://github.com/mrshirts may know more about this. Was there a Desmond example with phi=60?

— Reply to this email directly or view it on GitHub https://github.com/shirtsgroup/InterMol/issues/158#issuecomment-71918287 .

mrshirts commented 9 years ago

A way to add other engine specific modifications. So far, we've only identified the LAMMPS scaling factor of 2 which is fairly trivial to add, probably as an argument to this method? Maybe it could recognize the SCALE_FROM and SCALE_INTO class specific constants from which ever parser calls it?

The issue is that we can't always assume that the conversion to a canonical form of the interaction will be as simple as a scaling, so hard coding it as something like scale_from can be confusing.

The general flow we've been planning around is that a topology file of type X is read in, there is a function that converts it to a canonical form (one single form for equivalent interactions), and then a function that converts it from the canonical form to the format for topology file of type X. I don't want to hard code too much into the features of that convert_to_canonical function, because we don't necessarily know what it will be.

With this organization, there is a single flow for all types of inputs -- we abstract any structure/topology information that is specific to a particular code into a single function for each type.

This may have been addressed later in the chain, but I just though I'd clarify things -- I'll keep on moving down the chain.

On Wed, Jan 28, 2015 at 6:11 PM, Christoph Klein notifications@github.com wrote:

Yup! Exactly like that. The immediate things I can think of are:

Whenever a dihedral is added to the system, call this method on itself. Logic should probably be in moleculetype.

A way to add other engine specific modifications. So far, we've only identified the LAMMPS scaling factor of 2 which is fairly trivial to add, probably as an argument to this method? Maybe it could recognize the SCALE_FROM and SCALE_INTO class specific constants from which ever parser calls it?

— Reply to this email directly or view it on GitHub https://github.com/shirtsgroup/InterMol/issues/158#issuecomment-71936757 .

mrshirts commented 9 years ago

Some thoughts: Since the internal system only needs to store instances of "canonical" bonds, angles, dihedrals, and impropers, classes like RbDihedral[Type] end up being just "helper" classes for the parsers to get parameters into/out of the canonical (or internal) representations.

Correct.

One way to approach designing these helper classes is to add extra attributes to accomodate slightly different functional forms/parameter lists used by different engines, and leave some attributes as default ( None or 1 or 0, depending on the situation). The parsers would specify which engine-specific styles define which attributes (and in what order), so that engine-specific settings currently in forcedata.py could be taken out of the common intermol.forces submodule and put in engine-specific submodules.

That would be the best place to put the engine specific settings. However, since there are some codes that support multiple types of dihedrals, some of the mathematical conversion functions do get used by multiple codes, so they are there in the force folder to reduce duplication.

The engine-specific code would also set any other attributes that are needed to account for slight differences in functional forms (see discussion of _prefactor attribute, below).

Right, though I think my feeling is that it's better to let the attributes be more general transformation functions, than be hard-coded prefactors.

Then the canonical() function (or whatever it ends up being called) would handle the logic of converting to the more compact internal representation, depending on which attributes are defined (and maybe also generate warnings or errors when defined attributes are redundant or inconsistent--e.g. see sign and phi, below).

The compact internal/canonical representation would be free of excess attributes needed for automated parsing (by forcefunctions

a helper class) of various engine-specific parameter lists.

I think the important part of the internal representation is not that it is compact, but that it is sufficiently general to make sure all equivalent forms can be represented, and to reduce the N x N problem caused by interconverting all dihedral types (or other functional forms) to each other.

free of excess attributes needed for automated parsing (by forcefunctions + a helper class) of various engine-specific parameter lists.

To the extent we can unclutter this, yes, it would be good.

As an example, consider the GROMACS periodic dihedral (func. type 1), LAMMPS charmm http://lammps.sandia.gov/doc/dihedral_charmm.html and harmonic http://lammps.sandia.gov/doc/dihedral_harmonic.html dihedral styles, and LAMMPS cvff http://lammps.sandia.gov/doc/improper_cvff.html improper style. The relevant helper class is ProperPeriodicDihedralType. All of the styles' parameter lists have a force constant and a periodicity, so we obviously need k and multiplicity.

Not necessarily. Other functional forms represent multiplicity in different ways. Some functional forms that do not have an explicitly defined multiplicity can be equivalent.

The LAMMPS charmm style has a 1-4 weight (which should be checked against special_bonds and maybe used to set the system.lj_correction and system.coulomb_correction), so we'll keep the weight attribute (although this will be ignored when we convert to an internal-format dihedral).

The 1-4 weight stored in the pairs site, not in the dihedral form in the InterMol representation.

The LAMMPS charmm and GROMACS periodic functions have a phase shift angle, so we'll keep phi (though the name phase might be clearer since φ usually denotes the dihedral angle itself).

Sure.

LAMMPS harmonic dihedrals and cvff impropers use a sign coefficient 'd' on the cosine to represent the phase shift, so we could add the attribute sign.

That's a redundancy that I think might be overkill, since phase and d represent the same thinf.

Finally, LAMMPS cvff impropers are identical to LAMMPS harmonic dihedrals, but some people like to keep track of which dihedrals are proper vs. improper so we keep the improper attribute. See my untested modifications in proper_periodic_dihedral_type.py https://github.com/jpthompson17/InterMol/blob/develop/intermol/forces/proper_periodic_dihedral_type.py .

This could be done, though the stated goals of InterMol are to keep the energy the same, not necessarily the labelings -- because different force field split up the energy different ways, this really isn't something that we can guarantee in any case.

Back to the issue of the factor of 1/2 in harmonic bond/angle force constants. Here, the helper classes (HarmonicBondType and HarmonicAngleType, which in this case also comprise fundamental/canonical representations) could define a "hidden" attribute _prefactor (with, say, a default value of 1) that can be set by the parser to scale self.k. See my (again, untested) modifications and comments in harmonic_bond_type.py https://github.com/jpthompson17/InterMol/blob/develop/intermol/forces/harmonic_bond_type.py. In this case, the GROMACS parser would need to specify a prefactor of 1/2 on initialization or by calling bond.prefactor = 0.5.

So, as I mentioned, to minimize the proliferation of different attributes, my idea is that there are program-specific converters for each force field that are entirely encapsulated in the program_parser.py files; but that if their functional form is not specified, then we don't have to worry about carrying along lots of extra attributes everywhere -- all of the complication is restricted JUST to the convert_to and convert_from canonical functions.

mrshirts commented 9 years ago

And Jeff -

Thanks for the all the discussion and comments! This is very helpful. Right now, our focus is to get it up and working for desmond<->lammps<->gromacs; I think there will be a lot of code improvement that can go in as soon as the basic support is working.

jpthompson17 commented 9 years ago

LAMMPS harmonic dihedrals and cvff impropers use a sign coefficient 'd' on the cosine to represent the phase shift, so we could add the attribute sign. That's a redundancy that I think might be overkill, since phase and d represent the same thinf.

What motivated this was trying to find a way to parse LAMMPS harmonic dihedral and cvff improper parameter lists using the create_kwds_from_entries() function in forcefunctions.py. It seemed tempting to extend the ProperPeriodicDihedralType class rather than write a new force_type/force_class to accommodate the slight difference of switching from cos(n * phi - phase) => d * cos(n * phi), but I agree that having both d and phase attributes is a potentially confusing redundancy.

So, as I mentioned, to minimize the proliferation of different attributes, my idea is that there are program-specific converters for each force field that are entirely encapsulated in the program_parser.py files; but that if their functional form is not specified, then we don't have to worry about carrying along lots of extra attributes everywhere -- all of the complication is restricted JUST to the convert_to and convert_from canonical functions.

I totally agree. Each program-specific converter 'knows' only its functional forms and that of the InterMol universal representation. But then to what extent should 'helper' force_types/force_classes (RbDihedral[Type], ProperPeriodic[Type], FourierDihedral[Type]--i.e. those with functional forms that are not used internally) be relied upon? When I encounter an interaction type with a potential expressed by a different functional form/set of coefficients but equivalent/convertible to an existing 'canonical' functional form, should defining a new pair of helper classes and putting them in intermol.forces be preferred to just handling all the parsing/conversion within program_parser.py?

@mrshirts and @ctk3b (and other contributors) - Thanks for the great code! I'm happy if any of my comments/suggestions turn out to be helpful. I came across this project specifically needing a tool to convert from LAMMPS=>GROMACS and was able to get the conversion for my particular problem running with just a few modifications (to the old master branch, anyway :) ). I really appreciate how easy it was to understand/modify your code. It also saved me a great deal of time spent reinventing the wheel and writing/debugging my own conversion tool from scratch (and thus contributing to the proliferation of one-way conversion scripts that InterMol seeks to mitigate). While my specific conversion problem is solved (for now), I'm very interested in where this project is heading--it's definitely filling a void in the current world of open-source molecular simulation tools.

mrshirts commented 9 years ago

Jeff- Thanks for your comments and the contributions. We'll keep you posted on further releases, and feel free to drop by and contribute further in the future!

shirtsgroup / InterMol

Simplify reading and writing of canonical dihedrals #158