pren / poltype

POLTYPE: AMOEBA parametrization tool
https://pren.github.io/poltype
Other
4 stars 11 forks source link

Valence and charge issues #13

Closed mquevill closed 4 years ago

mquevill commented 4 years ago

I know in previous versions of POLTYPE the total charge of a charged molecule needed to only be specified once (according to documentation). However, this is no longer the case. I'm not sure if this was introduced with the addition of RDKit (or just a newer version?) or what caused this. But the valences of each atom now do need to be correct.

For instance, take a nitrate molecule (NO3-). With POLTYPE 1.1.4, I could specify just the overall negative charge, but using the newer version, it is not allowed. However, putting a negative charge on the N atom causes it to throw a warning: Explicit valence for atom # 0 N, 5, is greater than permitted.

Overall charge specified:

 OpenBabel05072016033D

  4  3  0  0  0  0  0  0  0  0999 V2000
   -0.6681    0.4065    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
   -1.7359   -0.2282   -0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.4154   -0.2008    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6840    1.6486   -0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  4  2  0  0  0  0
  3  1  1  0  0  0  0
M  END
$$$$

Corrected valences:

 OpenBabel05072016033D

  4  3  0  0  0  0  0  0  0  0999 V2000
   -0.6681    0.4065    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
   -1.7359   -0.2282   -0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    0.4154   -0.2008    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
   -0.6840    1.6486   -0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  4  2  0  0  0  0
  3  1  1  0  0  0  0
M  END
$$$$

Perhaps a check could be added to tell the user more specifically what to change? Or maybe an update to the documentation to mention this change between versions?

Additionally, if the valences are messed up in a 2D molecule (the example here is 2D), there will be erroneous H atoms added to correct the valences. In the above example, the 3D file generated then looks like:

 OpenBabel05082011523D

  6  5  0  0  0  0  0  0  0  0999 V2000
    1.0701    0.0414    0.0292 N   0  5  0  0  0  0  0  0  0  0  0  0
    0.4983   -0.7312    0.9965 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.4305    0.1165   -0.0647 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.3511    0.6467   -0.7284 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4288   -0.5477    0.7667 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.6620   -0.4705    0.6700 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  4  2  0  0  0  0
  2  5  1  0  0  0  0
  3  1  1  0  0  0  0
  3  6  1  0  0  0  0
M  CHG  1   1  -1
M  END
$$$$
misterbrandonwalker commented 4 years ago

Hey Michael,

Okay so the issue is that rdkit will attempt to "sanitize" your input molecule by default. This means a number of things, but one of them is it has some assumptions about the nature of what valences there should be for atoms, I will turn this off and update poltype. Babel of course will trust user input by default...

misterbrandonwalker commented 4 years ago

Another annoying thing is rdkit likes to keep molecules in 2D coordinates even if it reads 3D coordintes from file. I think this is because it is able to generate many conformers for you and then you can choose from them. I will add default behaviour to keep 3D coordinates from user input.

misterbrandonwalker commented 4 years ago

Actually, what I said about babel is not 100% correct. If you use --gen3D on nitrate molecule it will add hydrogens for you even though we dont tell babel to add hydrogens. So it seems babel makes some assumptions too but at least in rdkit it will raise a flag and you can choose to ignore it or not. Maybe this is special case since input molecule was flat to begin with. In that case poltype uses babel to generate 3D coordinates for you. Seeing how it also adds hydrogens at same time, I will use rdkit instead.

misterbrandonwalker commented 4 years ago

Okay, so I think we can agree the input file certainly is strange. One of the oxygens has double bond and rdkit definitly will complain unless you tell it should have that many. Another thing you can do is just remove that double bond in SDF file, then instead of worrying about changing valences, just tell SDF file the total charge (-3) and now poltype will make 3D coordinates without adding any hydrogen.