ucsf-wynton / wynton-website-hpc

The Official Wynton HPC User Website
https://wynton.ucsf.edu/hpc/
2 stars 14 forks source link

SBGrid/SGE: How to exclude nodes by hostname? #107

Open HenrikBengtsson opened 1 year ago

HenrikBengtsson commented 1 year ago

On https://wynton.ucsf.edu/hpc/software/sbgrid.html#sbgrid-programs-with-gpu-support we suggest:

"You may need to specify a beta version of the SBGrid programs, or avoid the qb3-atgpu* nodes."

but we don't give instructions anywhere how to avoid those nodes. Is that done by:

-l h="!qb3-atgpu*"

?

ellestad commented 1 year ago

Actually, now most GPU enabled SBGrid software ARE compiled for versions of CUDA new enough to run on the AMD/Nvidia A40 nodes. At least GROMACS and RELION are. Not sure what other softwares people use, those are the ones we get the most comments about.

ellestad commented 1 year ago

But, the above limit would avoid the atgpu nodes.

ellestad commented 1 year ago

Also, this "Because of this, you have to make sure you load a corresponding CUDA environment module, e.g. module load cuda/10.1." comment can be removed. SBGrid includes NVIDIA libraries where necessary, it doesn't depend on the system cuda.

HenrikBengtsson commented 1 year ago

I see. To be honest, I had to read that whole paragraph so many times to understand it. I blame lack of experience with GPU/CUDA.

Also, this "Because of this, you have to make sure you load a corresponding CUDA environment module, e.g. module load cuda/10.1." comment can be removed. SBGrid includes NVIDIA libraries where necessary, it doesn't depend on the system cuda.

Oh, I added that yesterday, because I thought it was forgotten. Should it be rephrased to: "WARNING: There is no need to load cuda modules when using SBGrid software, because they are included."? Also, if one loads a cuda module, is there a risk it will conflict with SBgrid? That is, do we need to warn against loading them?

Since you're much more experience with this, would you mind updating that section? Because, I'm mostly guessing and winging it here.

HenrikBengtsson commented 1 year ago

Also, when using SBGrid, do the user have to declare -l compute_cap=<version> as mentioned on https://wynton.ucsf.edu/hpc/scheduler/gpu.html#gpu-relevant-resource-requests?