Open adsk1 opened 5 years ago
What driver version, you should be running whatever the last version that contained CUDA 8.0 (after 375 but before 384)
Otherwise the CUDA 8.0 Toolkit is running against backward compatibility code in the driver which is not ideal at all and can lead to errors such as this.
Hi @Spudz76 , thx for your advise. My driver version is 384.111... is that ok?
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.111 Driver Version: 384.111 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro 600 Off | 00000000:02:00.0 Off | N/A | | 37% 54C P0 N/A / N/A | 0MiB / 964MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Sun_Sep__4_22:14:01_CDT_2016 Cuda compilation tools, release 8.0, V8.0.44
I believe you've got CUDA 8.0 Toolkit first release there is a second updated one V8.0.63 or such (GA2) which is what I use. Unsure what the older one does
384 contains CUDA 9.0 so no that's not ideal. You're actually making it worse by running the even older 8.0 against two-steps-newer driver, double backward compatibility. More gaps for bugs.
You need to revert to 375.
Linux driver version versus what CUDA runtime it contains listed here
You want the GA2 driver and uninstall 8.0 then install 8.0-GA2 toolkit. You probably won't get the 367 driver to work and match your toolkit plus that's known-bad CUDA version otherwise they wouldn't have released a GA2 (the only time they've done so at any version)
Also autotune is broken for Fermi now that I've been running it on mine, it will provide too much thread X blocks and crash trying to allocate memory (that exact error message). Especially for CN-R which has a random recompile in it which uses some more GPU memory (and isn't accounted for in the sizer for auto tuning)
I hunted settings for a while and got this as best for CN-R:
"threads": 10,
"blocks": 40,
"bfactor": 6,
"bsleep": 25,
"sync_mode": 3,
Other algos required different layout but these work good for regular CN variants. Some wanted 8 threads, I think try to stick with multiple of SMX (which is 2) otherwise tune blocks down until it quits failing but for example I don't think I got anything to work with heavy other than 4x4 which is slow and dumb
Okay, I've repaired some of this with #255 which eliminates the startup crashing ("unknown error") If you can, please grab that PR and build it and see if it reduces the problems
I found it impossible to guess whether bugs-crash or clocking-crash they all look the same from the console... Now that I'm running this patch it no longer crashes at all, and I can clock somewhat closer to what I used to run before CN-R was involved before anything starts failing (normal).
Also CN-GPU hates the above 10x40 6x25 combo it only really works on RWZ and maybe 0/1/2 old variants
12x26 10x25 works as best as I've found at 31.5H/s any more it crashes. I think it saturates the floating point unit (which these have less # of) before it hits the usual limits, thus the smaller blocks count.
And everything hates less than 8 bfactor, some coins want 10.
Start from 2.12...I encounter the illegal memory error as follow, Anyone can help? Thx!!