Closed chiroptical closed 4 years ago
Hi,
one thing, using flag:
you are reserving node with multiply gpus, but in fact you have acces to only one gpu, so then you should use mumax3 -gpu=0.
Mateusz
śr., 16 paź 2019 o 16:39 Barry Moore notifications@github.com napisał(a):
I am trying to help students run mumax3 on a HPC w/ Slurm. I installed via CGO_CFLAGS="-I$CUDA_ROOT/include -L$CUDA_ROOT/lib64" go get github.com/mumax/3/cmd/mumax3 using CUDA 7.5. At our center, we have relatively dense GPU nodes and set CUDA_VISIBLE_DEVICES for users. It seems you are aware of queueing software because the -cache flag is being set. For Slurm, users will not know which GPU they are allocated at job submission time. I tried running the first example from http://mumax.github.io/examples.html as a test, i.e.
SetGridsize(128, 32, 1) SetCellsize(500e-9/128, 125e-9/32, 3e-9)
Msat = 800e3 Aex = 13e-12 alpha = 0.02
m = uniform(1, .1, 0) relax() save(m) // relaxed state
autosave(m, 200e-12) tableautosave(10e-12)
B_ext = vector(-24.6E-3, 4.3E-3, 0) run(1e-9)
My Slurm script to submit the above mumax3 input on a GPU node w/ 3 cores, 1 GPU. You will see I load modules go (gives me $GOPATH/bin i.e. mumax3 in my $PATH) and cuda (software components for CUDA).
!/usr/bin/env bash
SBATCH --job-name=test-mumax
SBATCH --output=test-mumax.out
SBATCH --time=5:00:00
SBATCH --nodes=1
SBATCH --ntasks-per-node=3
SBATCH --cpus-per-task=1
SBATCH --cluster=gpu
SBATCH --partition=gtx1080
SBATCH --gres=gpu:1
module purge module load go/1.11.5 cuda/7.5.18
echo "CUDA DEVICE: $CUDA_VISIBLE_DEVICES"
mumax3 -h mumax3 -v mumax3 -gpu $CUDA_VISIBLE_DEVICES -o $SLURM_SUBMIT_DIR/output test-mumax.mx3
The output of the above job. I was allocated device 3 by Slurm, but you will see that it fails with panic: CUDA_ERROR_INVALID_DEVICE.
CUDA DEVICE: 3 Usage of mumax3: -cache string Kernel cache directory (empty disables caching) (default "/scratch/slurm-178702") -f Force start, clean existing output directory -failfast If one simulation fails, stop entire batch immediately -gpu int Specify GPU -http string Port to serve web gui (default ":35367") -i Open interactive browser session -o string Override output directory -paranoid Enable convolution self-test for cuFFT sanity. -s Silent -sync Synchronize all CUDA calls (debug) -test Cuda test (internal) -v Print version (default true) -vet Check input files for errors, but don't run them //mumax 3.10 linux_amd64 go1.11.5 (gc) //CUDA 7050 GeForce GTX 1080 Ti(11178MB) cc6.1 , using CC 61 PTX //(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium //This is free software without any warranty. See license.txt //****// // If you use mumax in any work or publication, // // we kindly ask you to cite the references in references.bib // //****// //****// //Please cite the following references, relevant for your simulation: // //See bibtex file in output folder for justification. // //****// panic: CUDA_ERROR_INVALID_DEVICE
goroutine 1 [running, locked to thread]:github.com/mumax/3/cuda/cu.CtxCreate(0x2, 0x3, 0x0) /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cuda/cu/context.go:17 +0xa3github.com/mumax/3/cuda.Init(0x3) /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cuda/init.go:32 +0x64 main.main() /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cmd/mumax3/main.go:32 +0x52
What is really curious about this error is: if I request the entire node and select specifically GPU 3 (without the environment variable), it works fine. See below:
A similar Slurm script requesting a full 12 core, 4 GPU node, mumax3 running on GPU 3. Obviously CUDA_VISIBLE_DEVICES isn't useful in this case because mumax3 runs on a single GPU, but I don't use it when executing.
!/usr/bin/env bash
SBATCH --job-name=test-mumax
SBATCH --output=test-mumax.out
SBATCH --time=5:00:00
SBATCH --nodes=1
SBATCH --ntasks-per-node=12
SBATCH --cpus-per-task=1
SBATCH --cluster=gpu
SBATCH --partition=gtx1080
SBATCH --gres=gpu:4
module purge module load go/1.11.5 cuda/7.5.18
echo "CUDA DEVICE: $CUDA_VISIBLE_DEVICES"
mumax3 -h mumax3 -v mumax3 -gpu 3 -o $SLURM_SUBMIT_DIR/output test-mumax.mx3
The corresponding output
CUDA DEVICE: 0,1,2,3 Usage of mumax3: -cache string Kernel cache directory (empty disables caching) (default "/scratch/slurm-178703") -f Force start, clean existing output directory -failfast If one simulation fails, stop entire batch immediately -gpu int Specify GPU -http string Port to serve web gui (default ":35367") -i Open interactive browser session -o string Override output directory -paranoid Enable convolution self-test for cuFFT sanity. -s Silent -sync Synchronize all CUDA calls (debug) -test Cuda test (internal) -v Print version (default true) -vet Check input files for errors, but don't run them //mumax 3.10 linux_amd64 go1.11.5 (gc) //CUDA 7050 GeForce GTX 1080(8119MB) cc6.1 , using CC 61 PTX //(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium //This is free software without any warranty. See license.txt //****// // If you use mumax in any work or publication, // // we kindly ask you to cite the references in references.bib // //****// //****// //Please cite the following references, relevant for your simulation: // //See bibtex file in output folder for justification. // //****// //mumax 3.10 linuxamd64 go1.11.5 (gc) //CUDA 7050 GeForce GTX 1080(8119MB) cc6.1 , using CC 61 PTX //(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium //This is free software without any warranty. See license.txt //****// // If you use mumax in any work or publication, // // we kindly ask you to cite the references in references.bib // //****// //output directory: /ihome/sam/bmooreii/workspace/mumax/output/ //starting GUI at http://127.0.0.1:35367 SetGridsize(128, 32, 1) SetCellsize(500e-9/128, 125e-9/32, 3e-9) Msat = 800e3 Aex = 13e-12 alpha = 0.02 m = uniform(1, .1, 0) relax() //Did not use cached kernel: open /scratch/slurm-178703/mumax3kernel[128 32 1][0 0 0][3.90625e-09 3.90625e-09 3e-09]_60 0.ovf: no such file or directory // Calculating demag kernel 1 % // Calculating demag kernel 100 % //Cached kernel: /scratch/slurm-178703/mumax3kernel[128 32 1][0 0 0][3.90625e-09 3.90625e-09 3e-09]6 save(m) autosave(m, 200e-12) tableautosave(10e-12) B_ext = vector(-24.6E-3, 4.3E-3, 0) run(1e-9) //****// //Please cite the following references, relevant for your simulation: // //See bibtex file in output folder for justification. // //****// // * Vansteenkiste et al., AIP Adv. 4, 107133 (2014).
Any help is appreciated.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mumax/3/issues/243?email_source=notifications&email_token=AD3VD6V26CXDNUUBJQBQLRDQO4RRLA5CNFSM4JBMQMCKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HSF4CPA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD3VD6V5HLJJECM2WLKQG7DQO4RRLANCNFSM4JBMQMCA .
-- Mateusz Zelent
I see! I think the -h
output could be improved to make the -gpu
argument more obvious. Thanks for your quick response!
-gpu int
Specify GPU
I am trying to help students run
mumax3
on a HPC w/ Slurm. I installed viaCGO_CFLAGS="-I$CUDA_ROOT/include -L$CUDA_ROOT/lib64" go get github.com/mumax/3/cmd/mumax3
using CUDA 7.5. At our center, we have relatively dense GPU nodes and setCUDA_VISIBLE_DEVICES
for users. It seems you are aware of queueing software because the-cache
flag is being set. For Slurm, users will not know which GPU they are allocated at job submission time. I tried running the first example from http://mumax.github.io/examples.html as a test, i.e.My Slurm script to submit the above
mumax3
input on a GPU node w/ 3 cores, 1 GPU. You will see I load modulesgo
(gives me$GOPATH/bin
i.e.mumax3
in my$PATH
) andcuda
(software components for CUDA).The output of the above job. I was allocated device 3 by Slurm, but you will see that it fails with
panic: CUDA_ERROR_INVALID_DEVICE
.What is really curious about this error is: if I request the entire node and select specifically GPU 3 (without the environment variable), it works fine. See below:
A similar Slurm script requesting a full 12 core, 4 GPU node,
mumax3
running on GPU 3. ObviouslyCUDA_VISIBLE_DEVICES
isn't useful in this case becausemumax3
runs on a single GPU, but I don't use it when executing.The corresponding output
Any help is appreciated.