leeping / forcebalance

Systematic force field optimization.
Other
146 stars 75 forks source link

runcuda.sh overwrites environment variables #284

Open lilyminium opened 1 year ago

lilyminium commented 1 year ago

Targets involving workqueue seem to wrap commands in data/runcuda.sh. This unfortunately overwrites variables in the local environment, trying to load quite an old version of CUDA (4/5), if the hostname matches some patterns. I think it would be easier for users to configure their own environments, and easier to debug issues.

The variables configured include:

I'm using forcebalance 1.9.5.

leeping commented 1 year ago

Thanks for bringing this up. I think runcuda.sh only wraps around targets that involve running OpenMM MD simulations using the npt.py / nvt.py scripts (such as Liquid_OpenMM). The Work Queue target that OpenFF usually uses is not wrapped by runcuda.sh because they use OpenMM to do energy minimizations and single-point calculations.

You are right that the environment variables in runcuda.sh are largely out of date and most of them should be deleted. I agree it would be better if the user could specify their own environment variables. A quick hack for a power user would be to edit the runcuda.sh file in their local install. A longer-range solution would be to add an option in the FB input file to specify a shell script that loads custom CUDA environment variables (which could include logic that loads different variables depending on the host name, if desired). The FB code would then include the environment file in the WQ input file list, and runcuda.sh would source the file if it is present.

On Wed, Jun 14, 2023 at 1:05 AM Lily Wang @.***> wrote:

Targets involving workqueue seem to wrap commands in data/runcuda.sh. This unfortunately overwrites variables in the local environment, trying to load quite an old version of CUDA (4/5), if the hostname matches some patterns. I think it would be easier for users to configure their own environments, and easier to debug issues.

The variables configured include:

  • CUDA_HOME
  • PATH
  • LD_LIBRARY_PATH
  • INCLUDE
  • BAK
  • CUDA_CACHE_PATH
  • OPENMM_CUDA_COMPILER
  • OPENMM_PLUGIN_DIR

I'm using forcebalance 1.9.5.

— Reply to this email directly, view it on GitHub https://github.com/leeping/forcebalance/issues/284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK76GHCB6WMHHKDDUK4SQLXLFWDRANCNFSM6AAAAAAZF6WW5U . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lilyminium commented 1 year ago

Thanks for the suggestion! I did just comment out that block for my own use -- I've also written my own submission script that loads up the necessary environment for each worker, so I'm not sure FB needs to handle it at all (unless I missed an existing facility to handle workqueue workers!)

And yes, I bumped into this with the Liquid_SMIRNOFF target that subclasses Liquid :)

leeping commented 1 year ago

I think if someone uses a single WQ job submission script (on a cluster) but uses WQ for different types of jobs (such as distributing QM calculations, or running FB Liquid simulations), it could be helpful for the applications to change the environment variables. We can probably comment out all of the code blocks leaving them as examples for any user who wants to customize their worker's environment, and then FB can default to not loading anything.