openforcefield / smarty

Chemical perception tree automated exploration tool.
http://openforcefield.org
MIT License
19 stars 8 forks source link

Matching molecules to topology atoms results in peculiar behavior for symmetric molecules #188

Open davidlmobley opened 7 years ago

davidlmobley commented 7 years ago

The current approach, wherein we identify molecules in our Topology by using a graph to match reference molecules onto the Topology, results in slightly odd behavior in the case of symmetric molecules. For example, for cyclohexane, the charge on one carbon ends up just slightly different than on other carbons, and our current "molecule to Topology" matching results in this carbon sometimes being the first one in a molecule, other times second, etc. (since there are multiple equivalent ways to map cyclohexane onto itself). While this is not a huge problem per se, it does result in some complexities when translating into formats for other simulation packages such as GROMACS. For example, cyclohexane would normally be represented as a single "molecule" in a GROMACS topology file, but if the carbon atoms have non-equivalent charges in different instances of the molecule, this can't be done (see https://github.com/ParmEd/ParmEd/issues/791).

@jchodera , any thoughts?

One approach would be to change how molecules are matched to the Topology to default to attempting to preserve atom order (relative to the first instance) unless this doesn't result in successful matches.

jchodera commented 7 years ago

Let me make sure I understand all the issues listed here:

  1. When MCSS returns degenerate matches, we choose the first match, which may result in a random relative ordering of atoms in the molecule mapped onto the OEMol reference molecule. This causes weirdness when ParmEd tries to "compress" these into unique molecules, because the same molecule type is now represented by multiple distinct (nearly) equivalent molecules. This doesn't actually cause problems, but is unexpected.
  2. There is an issue with the charge assignment in that equivalent molecules are not receiving equivalent charges.

Instead, we would like to:

  1. Somehow pick a unique mapping of relative atom order within each molecule to the OEMol reference molecule
  2. Have chemically equivalent atoms be assigned identical charges

Is that right?

davidlmobley commented 7 years ago

@jchodera :

When MCSS returns degenerate matches, we choose the first match, which may result in a random relative ordering of atoms in the molecule mapped onto the OEMol reference molecule. This causes weirdness when ParmEd tries to "compress" these into unique molecules, because the same molecule type is now represented by multiple distinct (nearly) equivalent molecules. This doesn't actually cause problems, but is unexpected.

Yes. It doesn't cause problems unless you expect that an OpenMM system written to/read from AMBER format will yield the same energies as an OpenMM system written to/read from GROMACS format.

There is an issue with the charge assignment in that equivalent molecules are not receiving equivalent charges.

This depends slightly on what you mean. Each molecule gets the same charges. It's just that they might be assigned to different atoms (i.e. carbon 1 of cyclohexane in the first instance gets the largest magnitude charge, whereas it is carbon 2 of cyclohexane in the second instance).

Or, maybe you mean that the problem is that symmetry-equivalent atoms are not receiving exactly equivalent charges. That's an issue, perhaps, though likely one with OpenEye's tools rather than with anything here. (I'll investigate it.)

Instead, we would like to:

  1. Somehow pick a unique mapping of relative atom order within each molecule to the OEMol reference molecule

Yes, that's the main thing I'm concerned about.

  1. Have chemically equivalent atoms be assigned identical charges

I think this is outside our scope. I'm mainly concerned here with ensuring that, if we have multiple instances of the same molecule, the charges are applied to each molecule in the same order. I don't think we can be responsible for ensuring that chemically equivalent atoms get equivalent charges unless we want to implement a chemistry toolkit of some sort to figure out which atoms are chemically equivalent.

davidlmobley commented 7 years ago

I don't think this is a problem in terms of internal functionality at all, but definitely it results in weird/unexpected issues when going into other formats.

jchodera commented 7 years ago

Yes. It doesn't cause problems unless you expect that an OpenMM system written to/read from AMBER format will yield the same energies as an OpenMM system written to/read from GROMACS format.

Unless I'm misunderstanding, that means one of those converters is broken. If the charges on atoms are inequivalent, for example, the GROMACS converter is making an error if it is assuming they are equivalent.

Or, maybe you mean that the problem is that symmetry-equivalent atoms are not receiving exactly equivalent charges. That's an issue, perhaps, though likely one with OpenEye's tools rather than with anything here. (I'll investigate it.)

That's certainly one problem!

Yes, that's the main thing I'm concerned about.

There are really no guarantees about what relative order the atoms will be in within a residue of a Topology. We can try to preserve relative atom ordering if possible if there are really degenerate matches---I think this would just involve selecting the MCSS match that matches the same relative atom mapping to the reference OEMol if one has already been observed---but this should be purely an act of selecting one of many totally equivalent behaviors.

I think this is outside our scope.

But isn't this just like the inequivalent improper issue? Should we perhaps be averaging charges over all the equivalent MCSS matches?

I'm mainly concerned here with ensuring that, if we have multiple instances of the same molecule, the charges are applied to each molecule in the same order.

This is just to allow gromacs compression, right?

The gromacs converter should not be compressing inequivalent molecules (in terms of atom ordering) into the same reference molecule---we should get that fixed too.

I don't think we can be responsible for ensuring that chemically equivalent atoms get equivalent charges unless we want to implement a chemistry toolkit of some sort to figure out which atoms are chemically equivalent.

The MCSS match already does exactly this, though! It has all of the chemically equivalent mappings identified.

davidlmobley commented 7 years ago

The gromacs converter should not be compressing inequivalent molecules (in terms of atom ordering) into the same reference molecule---we should get that fixed too.

Yes, working on that.

The MCSS match already does exactly this, though! It has all of the chemically equivalent mappings identified.

Ah, that's a good point. BUT -- is having all of the chemically equivalent mappings the same as identifying all of the atoms which should be chemically equivalent to one another? It's not immediately obvious to me that you can use the mappings (of one molecule onto another) to always identify which atoms (within the same molecule) should be symmetry-equivalent to one another, though it certainly seems like in many cases this can be done. My (very limited) brain seems to be trying to tell me these are slightly different problems.

We can try to preserve relative atom ordering if possible if there are really degenerate matches---I think this would just involve selecting the MCSS match that matches the same relative atom mapping to the reference OEMol if one has already been observed---but this should be purely an act of selecting one of many totally equivalent behaviors.

Yes, I think that's the "right" answer, unless there is a straightforward way to deal with the symmetry equivalence issue.

I wrote:

Or, maybe you mean that the problem is that symmetry-equivalent atoms are not receiving exactly equivalent charges. That's an issue, perhaps, though likely one with OpenEye's tools rather than with anything here. (I'll investigate it.)

I checked this and it seems the issue in this particular test case was that I was using a mol2 file charged prior to a fix to the precision OpenEye was using for charges in .mol2 files, so the symmetry was slightly broken. If I re-charge, the issue goes away. However, unless we enforce an "always charge the input molecules" scheme, I think we have to deal with this, as someone will presumably sooner or later feed in molecules which don't have symmetrized charges.

jchodera commented 7 years ago

Ah, that's a good point. BUT -- is having all of the chemically equivalent mappings the same as identifying all of the atoms which should be chemically equivalent to one another? It's not immediately obvious to me that you can use the mappings (of one molecule onto another) to always identify which atoms (within the same molecule) should be symmetry-equivalent to one another, though it certainly seems like in many cases this can be done. My (very limited) brain seems to be trying to tell me these are slightly different problems.

For our purposes, simply averaging charges over all symmetry-equivalent mappings should just do the right thing.

Yes, I think that's the "right" answer, unless there is a straightforward way to deal with the symmetry equivalence issue.

It's an equivalent approach that should not affect anything except for the ability of your gromacs writer to compress topology files.

I checked this and it seems the issue in this particular test case was that I was using a mol2 file charged prior to a fix to the precision OpenEye was using for charges in .mol2 files, so the symmetry was slightly broken. If I re-charge, the issue goes away. However, unless we enforce an "always charge the input molecules" scheme, I think we have to deal with this, as someone will presumably sooner or later feed in molecules which don't have symmetrized charges.

We could easily add an option to symmetrize charges, but I'm not sure if we would want it on or off by default.

davidlmobley commented 7 years ago

This was partly resolved by ParmEd, which no longer does peculiar things if charges aren't equivalent across topologically identical molecules. However, we should still try and produce symmetric matchings.

jchodera commented 7 years ago

However, we should still try and produce symmetric matchings.

Do you mean we should (1) average charges over degenerate matches to ensure chemically equivalent atoms are identical, or (2) use some sort of scheme to uniquely identify atom orders?

davidlmobley commented 7 years ago

Averaging over degenerate matches to ensure chemically equivalent atoms are identical.