Closed Dmytro-Apalkov closed 6 years ago
Please attach the log file or the output you see in the console.
//output directory: stt.out/ sizeX := 40e-9 sizeY := sizeX sizeZ := 1.4e-9 N := 32 setgridsize(N, N, 1) setcellsize(sizeX/N, sizeY/N, sizeZ) setGeom(circle(sizeX)) Msat = 800e3 Aex = 10e-12 Ku1 = 12e6 alpha = 0.01 m = uniform(0, 0, 1) lambda = 1 Pol = 0.5669 epsilonprime = 0 Temp = 300 fixdt = 2e-14 setsolver(2) ThermSeed(1) fixedlayer = vector(0.01, 0, 1) Jtot := -0.0014 area := sizeX sizeY pi / 4 jc := Jtot / area J = vector(0, 0, jc) autosave(m, 100e-12) tableautosave(10e-12) run(1e-9)
If it helps, the above script works fine on K40m, K80, M40 GPUs but does not work on P100 and P40. Thanks!
Sorry about the offtop, but could you run banchmark on your GPU's and paste here the results?
@Dmytro-Apalkov Thanks, could you please also post mumax3 output that appears in console and also the output of the nvidia-smi command? It should tell us which GPU driver / kernel versions are used upon mumax3 invocation.
Sure, kkingstoun, will paste the results to the comparison shortly...
@godsic . The output of the mumax is below. The execution time seems OK, the code seems to be running (or doing something) but there is no output. OUTPUT of MUMAX: mumax3 -gpu 6 test.mx //mumax 3.9.3 linux_amd64 go1.7.1 (gc) //CUDA 9000 Tesla P40(24445MB) cc6.1, using CC53 PTX //(c) Arne Vansteenkiste, Dynamat LAB, Ghent University, Belgium //This is free software without any warranty. See license.txt //output directory: test.out/ //starting GUI at http://127.0.0.1:35367 sizeX := 40e-9 sizeY := sizeX sizeZ := 1.4e-9 N := 32 setgridsize(N, N, 1) setcellsize(sizeX/N, sizeY/N, sizeZ) setGeom(circle(sizeX)) // Initializing geometry 3 % // Initializing geometry 100 % Msat = 800e3 Aex = 10e-12 Ku1 = 12e6 alpha = 0.01 m = uniform(0, 0, 1) lambda = 1 Pol = 0.5669 epsilonprime = 0 Temp = 300 fixdt = 2e-14 setsolver(2) ThermSeed(15) fixedlayer = vector(0.01, 0, 1) Jtot := -0.0014 area := sizeX sizeY pi / 4 jc := Jtot / area J = vector(0, 0, jc) autosave(m, 100e-12) tableautosave(10e-12) run(1e-9) //Not using kernel cache (-cache="")
OUTPUT of NVIDIA-SMI: nvidia-smi Fri Feb 9 07:35:26 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.81 Driver Version: 384.81 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P40 Off | 00000000:08:00.0 Off | Off | | N/A 28C P0 50W / 250W | 1255MiB / 24445MiB | 0% Default | +-------------------------------+----------------------+----------------------+
@Dmytro-Apalkov Thanks! Will you be able to try mumax3 binary linked against CUDA 9.1? If so, I will provide you with the link to download it.
@godsic 9.1 is not really. I have environment set up for 8.0, 8.5, 9.0.
@kkingstoun , I have added benchmark on 3 Tesla cards.
@Dmytro-Apalkov Thank You! You have nice toys ;)
@godsic One of the failing runs on P100 gave this error: panic: CURAND_STATUS_LAUNCH_FAILURE
goroutine 1 [running, locked to thread]: panic(0x8606e0, 0xc420144288) /home/arne/bin/go/src/runtime/panic.go:500 +0x1a1 github.com/mumax/3/cuda/curand.Generator.GenerateNormal(0x2b6405fafc30, 0x1090e40a000, 0x400, 0x3f80000000000000) /home/arne/src/github.com/mumax/3/cuda/curand/generator.go:41 +0xf8 github.com/mumax/3/engine.(thermField).update(0xe66980) /home/arne/src/github.com/mumax/3/engine/temperature.go:98 +0x23f github.com/mumax/3/engine.(thermField).AddTo(0xe66980, 0xc42007c8c0) /home/arne/src/github.com/mumax/3/engine/temperature.go:50 +0x50 github.com/mumax/3/engine.SetEffectiveField(0xc42007c8c0) /home/arne/src/github.com/mumax/3/engine/effectivefield.go:17 +0x94 github.com/mumax/3/engine.SetLLTorque(0xc42007c8c0) /home/arne/src/github.com/mumax/3/engine/torque.go:48 +0x2f github.com/mumax/3/engine.SetTorque(0xc42007c8c0) /home/arne/src/github.com/mumax/3/engine/torque.go:41 +0x2b github.com/mumax/3/engine.torqueFn(0xc42007c8c0) /home/arne/src/github.com/mumax/3/engine/run.go:93 +0x2b github.com/mumax/3/engine.(Heun).Step(0xf49110) /home/arne/src/github.com/mumax/3/engine/heun.go:26 +0x119 github.com/mumax/3/engine.step(0x843001) /home/arne/src/github.com/mumax/3/engine/run.go:196 +0x39 github.com/mumax/3/engine.runWhile(0xc420071990, 0xc4211cb801) /home/arne/src/github.com/mumax/3/engine/run.go:181 +0x94 github.com/mumax/3/engine.RunWhile(0xc420071990) /home/arne/src/github.com/mumax/3/engine/run.go:172 +0x3c github.com/mumax/3/engine.Run(0x3e112e0be826d695) /home/arne/src/github.com/mumax/3/engine/run.go:158 +0x57 reflect.Value.call(0x845e80, 0xa69ed0, 0x13, 0x8e89d5, 0x4, 0xc4211cb900, 0x1, 0x1, 0x13, 0x845e80, ...) /home/arne/bin/go/src/reflect/value.go:434 +0x5c8 reflect.Value.Call(0x845e80, 0xa69ed0, 0x13, 0xc4211cb900, 0x1, 0x1, 0xa, 0x0, 0x0) /home/arne/bin/go/src/reflect/value.go:302 +0xa4 github.com/mumax/3/script.(call).Eval(0xc420142f90, 0x1, 0x1) /home/arne/src/github.com/mumax/3/script/call.go:61 +0x1c7 github.com/mumax/3/engine.EvalFile(0xc4201428d0) /home/arne/src/github.com/mumax/3/engine/script.go:102 +0x13e main.runFileAndServe(0x7fffee9f8266, 0x7) /home/arne/src/github.com/mumax/3/cmd/mumax3/main.go:144 +0x151 main.main() /home/arne/src/github.com/mumax/3/cmd/mumax3/main.go:89 +0x1ce
@Dmytro-Apalkov Indeed the error you see is common if GPU drivers and (or) versions of CUDA libraries or mumax3 CUDA kernels are not appropriate for the particular GPU. Here you can download a mumax3 binary compiled from the master branch and linked against CUDA9.0.
@godsic , Thank you! I will test it out.
@godsic Sorry for being away for quite some time. I was busy with something else. Anyway, I have just tested the version that you compiled for CUDA9.0. It works just fine on all the cards. Thank you!
I am testing the mumax code and running some simple STT switching cases. I have noticed that the simulation on certain GPU cards (e.g. P40, P100) does not give any output or error (table.txt has only one line for t=0), whereas the same simulation runs fine on other GPU cards (e.g. M40). In all cases, CUDA is 7.5 and I am using the precompiled library.
Any suggestions?