mumax / 3

GPU-accelerated micromagnetic simulator
Other
447 stars 150 forks source link

`mumax3` not working on high numbered GPUs #243

Closed chiroptical closed 4 years ago

chiroptical commented 4 years ago

I am trying to help students run mumax3 on a HPC w/ Slurm. I installed via CGO_CFLAGS="-I$CUDA_ROOT/include -L$CUDA_ROOT/lib64" go get github.com/mumax/3/cmd/mumax3 using CUDA 7.5. At our center, we have relatively dense GPU nodes and set CUDA_VISIBLE_DEVICES for users. It seems you are aware of queueing software because the -cache flag is being set. For Slurm, users will not know which GPU they are allocated at job submission time. I tried running the first example from http://mumax.github.io/examples.html as a test, i.e.

SetGridsize(128, 32, 1)
SetCellsize(500e-9/128, 125e-9/32, 3e-9)

Msat  = 800e3
Aex   = 13e-12
alpha = 0.02

m = uniform(1, .1, 0)
relax()
save(m)    // relaxed state

autosave(m, 200e-12)
tableautosave(10e-12)

B_ext = vector(-24.6E-3, 4.3E-3, 0)
run(1e-9)

My Slurm script to submit the above mumax3 input on a GPU node w/ 3 cores, 1 GPU. You will see I load modules go (gives me $GOPATH/bin i.e. mumax3 in my $PATH) and cuda (software components for CUDA).

#!/usr/bin/env bash
#SBATCH --job-name=test-mumax
#SBATCH --output=test-mumax.out
#SBATCH --time=5:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=3
#SBATCH --cpus-per-task=1
#SBATCH --cluster=gpu
#SBATCH --partition=gtx1080
#SBATCH --gres=gpu:1

module purge
module load go/1.11.5 cuda/7.5.18

echo "CUDA DEVICE: $CUDA_VISIBLE_DEVICES"

mumax3 -h
mumax3 -v
mumax3 -gpu $CUDA_VISIBLE_DEVICES -o $SLURM_SUBMIT_DIR/output test-mumax.mx3

The output of the above job. I was allocated device 3 by Slurm, but you will see that it fails with panic: CUDA_ERROR_INVALID_DEVICE.

CUDA DEVICE: 3
Usage of mumax3:
  -cache string
        Kernel cache directory (empty disables caching) (default "/scratch/slurm-178702")
  -f    Force start, clean existing output directory
  -failfast
        If one simulation fails, stop entire batch immediately
  -gpu int
        Specify GPU
  -http string
        Port to serve web gui (default ":35367")
  -i    Open interactive browser session
  -o string
        Override output directory
  -paranoid
        Enable convolution self-test for cuFFT sanity.
  -s    Silent
  -sync
        Synchronize all CUDA calls (debug)
  -test
        Cuda test (internal)
  -v    Print version (default true)
  -vet
        Check input files for errors, but don't run them
//mumax 3.10 linux_amd64 go1.11.5 (gc)
//CUDA 7050 GeForce GTX 1080 Ti(11178MB) cc6.1 , using CC 61  PTX
//(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium
//This is free software without any warranty. See license.txt
//********************************************************************//
//  If you use mumax in any work or publication,                      //
//  we kindly ask you to cite the references in references.bib        //
//********************************************************************//
//********************************************************************//
//Please cite the following references, relevant for your simulation: //
//See bibtex file in output folder for justification.                 //
//********************************************************************//
panic: CUDA_ERROR_INVALID_DEVICE

goroutine 1 [running, locked to thread]:
github.com/mumax/3/cuda/cu.CtxCreate(0x2, 0x3, 0x0)
    /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cuda/cu/context.go:17 +0xa3
github.com/mumax/3/cuda.Init(0x3)
    /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cuda/init.go:32 +0x64
main.main()
    /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cmd/mumax3/main.go:32 +0x52

What is really curious about this error is: if I request the entire node and select specifically GPU 3 (without the environment variable), it works fine. See below:

A similar Slurm script requesting a full 12 core, 4 GPU node, mumax3 running on GPU 3. Obviously CUDA_VISIBLE_DEVICES isn't useful in this case because mumax3 runs on a single GPU, but I don't use it when executing.

#!/usr/bin/env bash
#SBATCH --job-name=test-mumax
#SBATCH --output=test-mumax.out
#SBATCH --time=5:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --cpus-per-task=1
#SBATCH --cluster=gpu
#SBATCH --partition=gtx1080
#SBATCH --gres=gpu:4

module purge
module load go/1.11.5 cuda/7.5.18

echo "CUDA DEVICE: $CUDA_VISIBLE_DEVICES"

mumax3 -h
mumax3 -v
mumax3 -gpu 3 -o $SLURM_SUBMIT_DIR/output test-mumax.mx3

The corresponding output

CUDA DEVICE: 0,1,2,3
Usage of mumax3:
  -cache string
        Kernel cache directory (empty disables caching) (default "/scratch/slurm-178703")
  -f    Force start, clean existing output directory
  -failfast
        If one simulation fails, stop entire batch immediately
  -gpu int
        Specify GPU
  -http string
        Port to serve web gui (default ":35367")
  -i    Open interactive browser session
  -o string
        Override output directory
  -paranoid
        Enable convolution self-test for cuFFT sanity.
  -s    Silent
  -sync
        Synchronize all CUDA calls (debug)
  -test
        Cuda test (internal)
  -v    Print version (default true)
  -vet
        Check input files for errors, but don't run them
//mumax 3.10 linux_amd64 go1.11.5 (gc)
//CUDA 7050 GeForce GTX 1080(8119MB) cc6.1 , using CC 61  PTX
//(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium
//This is free software without any warranty. See license.txt
//********************************************************************//
//  If you use mumax in any work or publication,                      //
//  we kindly ask you to cite the references in references.bib        //
//********************************************************************//
//********************************************************************//
//Please cite the following references, relevant for your simulation: //
//See bibtex file in output folder for justification.                 //
//********************************************************************//
//mumax 3.10 linux_amd64 go1.11.5 (gc)
//CUDA 7050 GeForce GTX 1080(8119MB) cc6.1 , using CC 61  PTX
//(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium
//This is free software without any warranty. See license.txt
//********************************************************************//
//  If you use mumax in any work or publication,                      //
//  we kindly ask you to cite the references in references.bib        //
//********************************************************************//
//output directory: /ihome/sam/bmooreii/workspace/mumax/output/
//starting GUI at http://127.0.0.1:35367
SetGridsize(128, 32, 1)
SetCellsize(500e-9/128, 125e-9/32, 3e-9)
Msat = 800e3
Aex = 13e-12
alpha = 0.02
m = uniform(1, .1, 0)
relax()
//Did not use cached kernel: open /scratch/slurm-178703/mumax3kernel_[128 32 1]_[0 0 0]_[3.90625e-09 3.90625e-09 3e-09]_6_0 0.ovf: no such file or directory
// Calculating demag kernel 1 %
// Calculating demag kernel 100 %
//Cached kernel: /scratch/slurm-178703/mumax3kernel_[128 32 1]_[0 0 0]_[3.90625e-09 3.90625e-09 3e-09]_6_
save(m)
autosave(m, 200e-12)
tableautosave(10e-12)
B_ext = vector(-24.6E-3, 4.3E-3, 0)
run(1e-9)
//********************************************************************//
//Please cite the following references, relevant for your simulation: //
//See bibtex file in output folder for justification.                 //
//********************************************************************//
//   * Vansteenkiste et al., AIP Adv. 4, 107133 (2014).

Any help is appreciated.

kkingstoun commented 4 years ago

Hi,

one thing, using flag:

SBATCH --gres=gpu:1

you are reserving node with multiply gpus, but in fact you have acces to only one gpu, so then you should use mumax3 -gpu=0.

Mateusz

śr., 16 paź 2019 o 16:39 Barry Moore notifications@github.com napisał(a):

I am trying to help students run mumax3 on a HPC w/ Slurm. I installed via CGO_CFLAGS="-I$CUDA_ROOT/include -L$CUDA_ROOT/lib64" go get github.com/mumax/3/cmd/mumax3 using CUDA 7.5. At our center, we have relatively dense GPU nodes and set CUDA_VISIBLE_DEVICES for users. It seems you are aware of queueing software because the -cache flag is being set. For Slurm, users will not know which GPU they are allocated at job submission time. I tried running the first example from http://mumax.github.io/examples.html as a test, i.e.

SetGridsize(128, 32, 1) SetCellsize(500e-9/128, 125e-9/32, 3e-9)

Msat = 800e3 Aex = 13e-12 alpha = 0.02

m = uniform(1, .1, 0) relax() save(m) // relaxed state

autosave(m, 200e-12) tableautosave(10e-12)

B_ext = vector(-24.6E-3, 4.3E-3, 0) run(1e-9)

My Slurm script to submit the above mumax3 input on a GPU node w/ 3 cores, 1 GPU. You will see I load modules go (gives me $GOPATH/bin i.e. mumax3 in my $PATH) and cuda (software components for CUDA).

!/usr/bin/env bash

SBATCH --job-name=test-mumax

SBATCH --output=test-mumax.out

SBATCH --time=5:00:00

SBATCH --nodes=1

SBATCH --ntasks-per-node=3

SBATCH --cpus-per-task=1

SBATCH --cluster=gpu

SBATCH --partition=gtx1080

SBATCH --gres=gpu:1

module purge module load go/1.11.5 cuda/7.5.18

echo "CUDA DEVICE: $CUDA_VISIBLE_DEVICES"

mumax3 -h mumax3 -v mumax3 -gpu $CUDA_VISIBLE_DEVICES -o $SLURM_SUBMIT_DIR/output test-mumax.mx3

The output of the above job. I was allocated device 3 by Slurm, but you will see that it fails with panic: CUDA_ERROR_INVALID_DEVICE.

CUDA DEVICE: 3 Usage of mumax3: -cache string Kernel cache directory (empty disables caching) (default "/scratch/slurm-178702") -f Force start, clean existing output directory -failfast If one simulation fails, stop entire batch immediately -gpu int Specify GPU -http string Port to serve web gui (default ":35367") -i Open interactive browser session -o string Override output directory -paranoid Enable convolution self-test for cuFFT sanity. -s Silent -sync Synchronize all CUDA calls (debug) -test Cuda test (internal) -v Print version (default true) -vet Check input files for errors, but don't run them //mumax 3.10 linux_amd64 go1.11.5 (gc) //CUDA 7050 GeForce GTX 1080 Ti(11178MB) cc6.1 , using CC 61 PTX //(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium //This is free software without any warranty. See license.txt //****// // If you use mumax in any work or publication, // // we kindly ask you to cite the references in references.bib // //****// //****// //Please cite the following references, relevant for your simulation: // //See bibtex file in output folder for justification. // //****// panic: CUDA_ERROR_INVALID_DEVICE

goroutine 1 [running, locked to thread]:github.com/mumax/3/cuda/cu.CtxCreate(0x2, 0x3, 0x0) /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cuda/cu/context.go:17 +0xa3github.com/mumax/3/cuda.Init(0x3) /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cuda/init.go:32 +0x64 main.main() /ihome/crc/install/go/1.11.5/gopath/src/github.com/mumax/3/cmd/mumax3/main.go:32 +0x52

What is really curious about this error is: if I request the entire node and select specifically GPU 3 (without the environment variable), it works fine. See below:

A similar Slurm script requesting a full 12 core, 4 GPU node, mumax3 running on GPU 3. Obviously CUDA_VISIBLE_DEVICES isn't useful in this case because mumax3 runs on a single GPU, but I don't use it when executing.

!/usr/bin/env bash

SBATCH --job-name=test-mumax

SBATCH --output=test-mumax.out

SBATCH --time=5:00:00

SBATCH --nodes=1

SBATCH --ntasks-per-node=12

SBATCH --cpus-per-task=1

SBATCH --cluster=gpu

SBATCH --partition=gtx1080

SBATCH --gres=gpu:4

module purge module load go/1.11.5 cuda/7.5.18

echo "CUDA DEVICE: $CUDA_VISIBLE_DEVICES"

mumax3 -h mumax3 -v mumax3 -gpu 3 -o $SLURM_SUBMIT_DIR/output test-mumax.mx3

The corresponding output

CUDA DEVICE: 0,1,2,3 Usage of mumax3: -cache string Kernel cache directory (empty disables caching) (default "/scratch/slurm-178703") -f Force start, clean existing output directory -failfast If one simulation fails, stop entire batch immediately -gpu int Specify GPU -http string Port to serve web gui (default ":35367") -i Open interactive browser session -o string Override output directory -paranoid Enable convolution self-test for cuFFT sanity. -s Silent -sync Synchronize all CUDA calls (debug) -test Cuda test (internal) -v Print version (default true) -vet Check input files for errors, but don't run them //mumax 3.10 linux_amd64 go1.11.5 (gc) //CUDA 7050 GeForce GTX 1080(8119MB) cc6.1 , using CC 61 PTX //(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium //This is free software without any warranty. See license.txt //****// // If you use mumax in any work or publication, // // we kindly ask you to cite the references in references.bib // //****// //****// //Please cite the following references, relevant for your simulation: // //See bibtex file in output folder for justification. // //****// //mumax 3.10 linuxamd64 go1.11.5 (gc) //CUDA 7050 GeForce GTX 1080(8119MB) cc6.1 , using CC 61 PTX //(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium //This is free software without any warranty. See license.txt //****// // If you use mumax in any work or publication, // // we kindly ask you to cite the references in references.bib // //****// //output directory: /ihome/sam/bmooreii/workspace/mumax/output/ //starting GUI at http://127.0.0.1:35367 SetGridsize(128, 32, 1) SetCellsize(500e-9/128, 125e-9/32, 3e-9) Msat = 800e3 Aex = 13e-12 alpha = 0.02 m = uniform(1, .1, 0) relax() //Did not use cached kernel: open /scratch/slurm-178703/mumax3kernel[128 32 1][0 0 0][3.90625e-09 3.90625e-09 3e-09]_60 0.ovf: no such file or directory // Calculating demag kernel 1 % // Calculating demag kernel 100 % //Cached kernel: /scratch/slurm-178703/mumax3kernel[128 32 1][0 0 0][3.90625e-09 3.90625e-09 3e-09]6 save(m) autosave(m, 200e-12) tableautosave(10e-12) B_ext = vector(-24.6E-3, 4.3E-3, 0) run(1e-9) //****// //Please cite the following references, relevant for your simulation: // //See bibtex file in output folder for justification. // //****// // * Vansteenkiste et al., AIP Adv. 4, 107133 (2014).

Any help is appreciated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mumax/3/issues/243?email_source=notifications&email_token=AD3VD6V26CXDNUUBJQBQLRDQO4RRLA5CNFSM4JBMQMCKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HSF4CPA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD3VD6V5HLJJECM2WLKQG7DQO4RRLANCNFSM4JBMQMCA .

-- Mateusz Zelent

chiroptical commented 4 years ago

I see! I think the -h output could be improved to make the -gpu argument more obvious. Thanks for your quick response!

  -gpu int
        Specify GPU