xmrig / xmrig-nvidia

Monero (XMR) NVIDIA miner
GNU General Public License v3.0
703 stars 260 forks source link

An illegal memory access was encountered, CUDA8, Ubuntu 16.04, Quadro 600, XMRig-NVIDIA/2.14.1 #251

Open adsk1 opened 5 years ago

adsk1 commented 5 years ago

Start from 2.12...I encounter the illegal memory error as follow, Anyone can help? Thx!!

Spudz76 commented 5 years ago

What driver version, you should be running whatever the last version that contained CUDA 8.0 (after 375 but before 384)

Otherwise the CUDA 8.0 Toolkit is running against backward compatibility code in the driver which is not ideal at all and can lead to errors such as this.

adsk1 commented 5 years ago

Hi @Spudz76 , thx for your advise. My driver version is 384.111... is that ok?

Thu Mar 14 10:51:28 2019

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.111 Driver Version: 384.111 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro 600 Off | 00000000:02:00.0 Off | N/A | | 37% 54C P0 N/A / N/A | 0MiB / 964MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Sun_Sep__4_22:14:01_CDT_2016 Cuda compilation tools, release 8.0, V8.0.44

Spudz76 commented 5 years ago

I believe you've got CUDA 8.0 Toolkit first release there is a second updated one V8.0.63 or such (GA2) which is what I use. Unsure what the older one does

384 contains CUDA 9.0 so no that's not ideal. You're actually making it worse by running the even older 8.0 against two-steps-newer driver, double backward compatibility. More gaps for bugs.

You need to revert to 375.

Linux driver version versus what CUDA runtime it contains listed here

You want the GA2 driver and uninstall 8.0 then install 8.0-GA2 toolkit. You probably won't get the 367 driver to work and match your toolkit plus that's known-bad CUDA version otherwise they wouldn't have released a GA2 (the only time they've done so at any version)

Also autotune is broken for Fermi now that I've been running it on mine, it will provide too much thread X blocks and crash trying to allocate memory (that exact error message). Especially for CN-R which has a random recompile in it which uses some more GPU memory (and isn't accounted for in the sizer for auto tuning)

I hunted settings for a while and got this as best for CN-R:

            "threads": 10,
            "blocks": 40,
            "bfactor": 6,
            "bsleep": 25,
            "sync_mode": 3,

Other algos required different layout but these work good for regular CN variants. Some wanted 8 threads, I think try to stick with multiple of SMX (which is 2) otherwise tune blocks down until it quits failing but for example I don't think I got anything to work with heavy other than 4x4 which is slow and dumb

Spudz76 commented 5 years ago

Okay, I've repaired some of this with #255 which eliminates the startup crashing ("unknown error") If you can, please grab that PR and build it and see if it reduces the problems

I found it impossible to guess whether bugs-crash or clocking-crash they all look the same from the console... Now that I'm running this patch it no longer crashes at all, and I can clock somewhat closer to what I used to run before CN-R was involved before anything starts failing (normal).

Spudz76 commented 5 years ago

Also CN-GPU hates the above 10x40 6x25 combo it only really works on RWZ and maybe 0/1/2 old variants

12x26 10x25 works as best as I've found at 31.5H/s any more it crashes. I think it saturates the floating point unit (which these have less # of) before it hits the usual limits, thus the smaller blocks count.

And everything hates less than 8 bfactor, some coins want 10.