michellab / Sire

Sire Molecular Simulations Framework
http://siremol.org
GNU General Public License v3.0
95 stars 26 forks source link

Analyse_freenrg mbar Gives Inconsistent Overlap #373

Closed fjclark closed 2 years ago

fjclark commented 2 years ago

Hello,

I'm running ABFE calculations using a version of Sire modified to use Boresch protein-ligand restraints (https://github.com/fjclark/Sire/tree/feature_boresch_restraints). During the stage in which the Boresch restraints are turned on (but no alchemical changes occur), several of the overlap matrices are very asymmetric, although the minimum overlap is still reasonably high: image image

However, when I rerun MBAR for the same trajectories using the same input script (with analyse_freenrg mbar -i lambda*/simfile.dat -p 83 --overlap --temperature 298.0), the resulting matrices are symmetric, as expected, although the overall free energy change is less than 0.01 kcal mol-1 different: image image

Upon re-running 5 MBAR five times, the results are identical and give the symmetric output each time.

Files to reproduce (run ../somd-gpu.sh then ../mbar.sh from output directory).

Should I ignore this as the actual change in values is minimal and overlap is good in all cases (MBAR not quite converging with different random seeds?), or does this likely indicate an underlying issue with my simulations/ analyse_freenrg mbar?

Thanks.

lohedges commented 2 years ago

Just to check... You have two scripts, somd-gpu.sh and mbar.sh. In the information above you are suggesting running them separately, one after the other. However, the somd-gpu.sh script actually submits the mbar.sh script at the end, i.e. it should perform a single analysis stage for task 5 (the last lambda window) when all of the tasks have finished.) The mbar.sh script itself does not list any job dependencies, although it does sleep for 30 seconds before starting.

Is it possible that you've submitted both scripts, so the initial analysis that you are seeing is simply what you would observe after a very short stretch of simulation? If not, are the first two overlap matrices generated from the output of the analysis of the somd-gpu.sh script? If so, could it be possible that the following logic isn't working for some reason?

if [ "$SLURM_ARRAY_TASK_ID" -eq "5" ]
then
  wait
  sleep 30
  sbatch ../mbar.sh
fi

(Since the MBAR analysis is so simple, couldn't it just be added in the conditional block, rather than submitting a separate script? Perhaps it is more flexible to do it this way for some reason?)

If repeatedly running the analysis on the simulation output when it's finished is giving consistent results, then I can only assume that the initial analysis is being performed before the simulation is finished. If you see weird output for the first analysis post completion, then consistent analysis beyond that, then it sounds like something funky is going on.

Cheers.

fjclark commented 2 years ago

Sorry, I had forgotten that somd-gpu.sh submits mbar.sh. The script used to submit these just submits somd-gpu.sh.

Yes, they are. That seems most likely. However, checking another repeat from the same calculation as above (with no asymmetry issues) by rerunning MBAR gives identical results, so in that case it seems that the script did run after the simulations completed. Also, an almost identical script (with the above block) has been used for a reasonable amount of work by the group (https://github.com/michellab/MDM2-DG_paper). @jmichel80, I assume you didn't notice any similar issues?

Anyway, I'll change the script. Thanks very much

jmichel80 commented 2 years ago

somd-gpu.sh should block on the srun XXX line until somd-freenrg has completed. Then the bash logic submits mbars.sh if the above call for somd-freenrg was at lambda=1.0. The reasoning is that all other jobs at intermediate lambda values must have completed if submitted sequentially. This woul break if different jobs in the job array were run on faster/slower GPUs but that has worked ok on our cluster for several years.

If I were to to write this again I would use job dependencies to launch mbar when the whole somd-gpu array has completed.

fjclark commented 2 years ago

Thanks Julien.

As the issue seems more likely to be related to Slurm than Sire I'll close it (and open another if I notice inconsistent results from analyses of simulations which have definitely finished).

Thanks very much.