LJ Correction Gives -inf

fjclark commented 2 years ago

Hello,

I'm running ABFE calculations using a version of Sire modified to use Boresch protein-ligand restraints (https://github.com/fjclark/Sire/tree/feature_boresch_restraints). Previously, I have had no issues with the LJ correction, but now I am consistently getting -inf.

To reproduce:

Failure (lj_failure.zip) Run somd-gpu.sh, then ljcor.sh from the output dir. Ran on our cluster. Simulations run using the above version of Sire before the 2022.1.0 changes were merged, while the lj correction was run using the 2022.1.0 version of Sire within BSS.

Success (lj_success.zip) As above, but LJ correction run using Sire 2020.1.0 within BSS on my workstation.

I have tried:

Reprocessing the failing trajectories using the LJ correction script within my modified version of Sire (so 2020.1.0) - fails
Copying the failing trajectories over to my workstation and using the LJ script within Sire 2022.1.0 (py37h3fd9d12_0) within BSS - fails

I am currently reprocessing the previously successful trajectories using my updated version of Sire within BSS.

Thanks very much.

lohedges commented 2 years ago

Hi there,

Other than the LJ correction issue, does the simulation output look consistent, e.g. trajectories, data in the output files, etc. I'm wondering it you are being hit by this issue. We now create a test molecule on startup to ensure that the templated atomic properties are picked up correctly. This means that molecule numbering starts at 2, whereas it previously started at 1. Some of the SOMD setup code assumes that the perturbable molecule has MolNum(1)), which will no longer be the case (and shouldn't be the case in general, anyway).

Cheers.

fjclark commented 2 years ago

Hi,

I've not yet merged in the changes from 2022.1.0 to the version of Sire which was used to run these simulations, and everything else looks fine. I've done a reasonably thorough analysis of a few sets of calculations hit by this issue (I've run 5 sets of simulations with varying Boresch restraints, each with 5 replicates).

Thanks.

lohedges commented 2 years ago

Ah, sorry, I missed that part. So it's just failing using lj-tailcorrection from the latest version of the code? Looking at the associated Python script LJcutoff.py it's using (amongst other things) createSystemFreeEnergy from OpenMMMD.py, which might be affected by the MolNum offset.

I'll take a closer look to see if I can figure out where things are going wrong.

fjclark commented 2 years ago

No worries. It also seems to be failing using LJcutoff.py from within my version of Sire which has not yet been updated.

Thanks very much.

lohedges commented 2 years ago

It's strange that it works with a significantly older version of Sire (2020.1.0). I can't see any specific changes to the LJCutoff code during this time, although it's hard to know since the Python script calls lots of C++ functionality that might have changed. Perhaps it's worth trying the script on some existing ABFE output that doesn't use your restraint modifications to see if it still reports NaN.

fjclark commented 2 years ago

Ah, sorry, I meant 2021.1.0. instead of 2020.1.0 in all cases. I'll try that and let you know (normally takes > 8 hours to run the correction).

lohedges commented 2 years ago

Gosh, I didn't realise that it was so painful. It could be something to do with the change in underlying Qt containers, which might mean that some of the calculations are performed in a different order, e.g. summations. This shouldn't affect the overall result, but small changes in accumulated numerical errors could potentially cause issues if values are very large or small.

fjclark commented 2 years ago

I've tested the LJ correction from Sire 2022.1.0 with both existing ABFE output which doesn't use my modifications, and existing output which does use my modified restraints but for which the older LJ correction worked (see lj_success.zip). It works in both cases.

michellab / Sire

LJ Correction Gives -inf #372