openmm / openmm-plumed

OpenMM plugin to interface with PLUMED
59 stars 23 forks source link

openmm plumed write output to bck.0.COLVARS instead of COLVARS #65

Open smliu1997 opened 1 year ago

smliu1997 commented 1 year ago

Hi,

I found when I simulate a specific system with openmm plumed on GPU by CUDA, the plumed output, which should be written to COLVARS, is actually written to bck.0.COLVARS while an empty COLVARS is also produced (before the simulation starts, there is no COLVARS file in the working directory).

To reproduce the issue, you can download and unpack the .tar.gz file, then in the directory with all the files, run command python run_system.py --platform CUDA > simulation_output.txt, you will see an empty COLVARS is produced, and the plumed output is written to bck.0.COLVARS.

My openmm version is 7.5.1, openmm-plumed version is 1.0, cudatoolkit is 10.0.130. GPU type is Nvidia Volta V100.

This problem only happens when using GPU. If run by CPU with command: python run_system.py --platform CPU > simulation_output.txt, then openmm plumed output are correctly written to COLVARS.

Additionally, if you remove this line simulation.minimizeEnergy() from run_system.py, then run with CUDA, plumed output will be correctly written to COLVARS.

So far based on these tests, I cannot locate the source of this issue. It will be helpful if anyone could help, thanks!

simulation.tar.gz

smliu1997 commented 1 year ago

Additionally, my labmate tells me he can reproduce this issue with his GPU. Here are the software and hardware he use: cudatoolkit 11.5.1 openmm 7.7.0 openmm-plumed 1.0 NVIDIA-SMI 495.53 Driver Version: 497.29 CUDA Version: 11.5 GPU: NVIDIA GeForce GTX 1080

peastman commented 1 year ago

Does the same thing happen if you use the CPU platform, or is it only on CUDA?

smliu1997 commented 1 year ago

This only happens on CUDA. When using CPU, the output can be correctly written to COLVARS.

peastman commented 1 year ago

Strange. There's not a lot of difference in how the two platforms interact with PLUMED. I wonder if it's because the CUDA platform does its work on a separate thread rather than the main thread?

smliu1997 commented 1 year ago

This happens on this specific system. I do not see this on other systems. Meanwhile, if we do not do energy minimization, then this issue disappears.

peastman commented 1 year ago

That sounds a lot like a threading issue. What version of PLUMED are you using? If you can reproduce it with the latest version, we should open an issue with the developers to look into it.

smliu1997 commented 1 year ago

I can reproduce this issue with plumed 2.8.1, which I think is the latest version.

peastman commented 1 year ago

Thanks! I just opened an issue about it: https://github.com/plumed/plumed2/issues/882.

smliu1997 commented 1 year ago

Thank you so much for your help!

Bernadette-Mohr commented 1 year ago

I can confirm this behavior, even with adding plumed only after minimization and equilibration, right before the production run. It does not happen every time, but I wasn't yet able to pinpoint what conditions cause it.