runcuda.sh overwrites environment variables

lilyminium commented 1 year ago

Targets involving workqueue seem to wrap commands in data/runcuda.sh. This unfortunately overwrites variables in the local environment, trying to load quite an old version of CUDA (4/5), if the hostname matches some patterns. I think it would be easier for users to configure their own environments, and easier to debug issues.

The variables configured include:

CUDA_HOME
PATH
LD_LIBRARY_PATH
INCLUDE
BAK
CUDA_CACHE_PATH
OPENMM_CUDA_COMPILER
OPENMM_PLUGIN_DIR

I'm using forcebalance 1.9.5.

leeping commented 1 year ago

Thanks for bringing this up. I think runcuda.sh only wraps around targets that involve running OpenMM MD simulations using the npt.py / nvt.py scripts (such as Liquid_OpenMM). The Work Queue target that OpenFF usually uses is not wrapped by runcuda.sh because they use OpenMM to do energy minimizations and single-point calculations.

You are right that the environment variables in runcuda.sh are largely out of date and most of them should be deleted. I agree it would be better if the user could specify their own environment variables. A quick hack for a power user would be to edit the runcuda.sh file in their local install. A longer-range solution would be to add an option in the FB input file to specify a shell script that loads custom CUDA environment variables (which could include logic that loads different variables depending on the host name, if desired). The FB code would then include the environment file in the WQ input file list, and runcuda.sh would source the file if it is present.

On Wed, Jun 14, 2023 at 1:05 AM Lily Wang @.***> wrote:

Targets involving workqueue seem to wrap commands in data/runcuda.sh. This unfortunately overwrites variables in the local environment, trying to load quite an old version of CUDA (4/5), if the hostname matches some patterns. I think it would be easier for users to configure their own environments, and easier to debug issues.

The variables configured include:

CUDA_HOME

PATH

LD_LIBRARY_PATH

INCLUDE

BAK

CUDA_CACHE_PATH

OPENMM_CUDA_COMPILER

OPENMM_PLUGIN_DIR

I'm using forcebalance 1.9.5.

— Reply to this email directly, view it on GitHub https://github.com/leeping/forcebalance/issues/284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK76GHCB6WMHHKDDUK4SQLXLFWDRANCNFSM6AAAAAAZF6WW5U . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lilyminium commented 1 year ago

Thanks for the suggestion! I did just comment out that block for my own use -- I've also written my own submission script that loads up the necessary environment for each worker, so I'm not sure FB needs to handle it at all (unless I missed an existing facility to handle workqueue workers!)

And yes, I bumped into this with the Liquid_SMIRNOFF target that subclasses Liquid :)

leeping commented 1 year ago

I think if someone uses a single WQ job submission script (on a cluster) but uses WQ for different types of jobs (such as distributing QM calculations, or running FB Liquid simulations), it could be helpful for the applications to change the environment variables. We can probably comment out all of the code blocks leaving them as examples for any user who wants to customize their worker's environment, and then FB can default to not loading anything.

leeping / forcebalance

runcuda.sh overwrites environment variables #284