Nvidia Quadro K6000 - Githubissues

alvarezrjx commented 5 years ago

Good morning

I would like to know a recommended configuration for the Nvida Quadro K6000 video card, for use with the XMRIG 2.14.3

Spudz76 commented 5 years ago

I have better luck on Kepler and Maxwell cores by using a different ratio than the autoconf provides.

Like 30x45 or so instead of the 22x30 it probably is near with autoconf. I use threads as 2 times SMX (yours seems to be 15 SMX) and then as many blocks as it will take without out-of-memory or other init failures, or around 3 times SMX.

My GTX970 (13 SMX maxwell) runs best at 26x39 although I forget what it gives for autoconf.

You will run out of GPU processing math power way before you fill the memory, they hit a point where using more memory doesn't increase the rates.

mechanator commented 4 years ago

The real issue is also that there are so many variants of the cards that have a differing numbers of cuda cores . Especially in the GTX realm of card with the same chip or less number of cores. The algorithm for tuning on the fly detection gets you within 80% of max hashrate. However, determining the number of threads/blocks changes with each algorithm mined. Coupled that if you are on a rotating algorithm mining pool, the static settings entered might not work if the miner is configured "algo": "auto" and "variant": "auto" in the confg.json file.

A good proposal for fixing this would be a lookup table to optimize the settings for each variant card with some headspace for threads, correct block settings. Reduce the number of threads to not bang into the VRAM limitation of each card assuming it's loaded in Windows which reserves some RAM for the base drivers. This adaptation memory allocation problem is moot on cards of 4GB or higher, but you can run into it on 1-3GB VRAM cards. The math to determine threads and blocks doesn't make much sense as documented from xmr-stak. You can't just sort the thread/blocks settings by architecture. No, that would be a simple 5-6 case statement. Its determined by the SMX count, the amount of free usable ram on the GPU, and some kind of MOD divisor based on the number of CUDA cores involved with the SMX allocation.

The advantage is that you can get 25% more hashrate when you manually tune the cards by approximation and incremental stepping up/down the threads/blocks. However that doesn't work if you are mining on a pool with possible rotating algos. Also if a coin changes algorithm settings , possibly from a fork, then you might not be optimally set to mine the highest hashrate(or crash) possible.

mechanator commented 4 years ago

The developer or CUDA call could interrogate the hardware and get the specific OEM PCI vendor and device ID and then match up the amount of CUDA cores, to determine exactly the amount of blocks and threads.
Several tables exist illustrating this and could be cross referenced from the driver INF, and the PCI device list of the currently installed cards. Referencing these models. https://en.wikipedia.org/wiki/GeForce_400_series https://en.wikipedia.org/wiki/GeForce_500_series https://en.wikipedia.org/wiki/GeForce_600_series https://en.wikipedia.org/wiki/GeForce_700_series https://en.wikipedia.org/wiki/GeForce_900_series https://en.wikipedia.org/wiki/GeForce_10_series https://en.wikipedia.org/wiki/GeForce_16_series https://en.wikipedia.org/wiki/GeForce_20_series and Quadro 600 or later versions, excluding the FX and 2.0<CUDA versions. https://en.wikipedia.org/wiki/Nvidia_Quadro

PCI vendor id quick references: https://www.nv-drivers.eu/nvidia-all-devices.html and older list that might fill in the gaps for non reference versions. https://envytools.readthedocs.io/en/latest/hw/pciid.html

Just trying to help.

xmrig / xmrig-nvidia

Nvidia Quadro K6000 #274