The latest "Cuda 8.0"-plugin not working with "Dero HE (astrobwt/v2) "-algo.

avselang commented 2 years ago

The latest "Cuda 8.0"-plugin not working with "Dero HE (astrobwt/v2) "-algo. nv disabled (no suitable configuration found) Tested with xmrig-6.17.0-msvc-win64 my GPU is NVIDIA GeForce 710M with OpenCL 1.1 and CUDA 2.1 .

Spudz76 commented 2 years ago

That should be a Kepler capability 3.5 (CUDA_ARCH=35) not Fermi capability 2.1

You can use newer CUDA because 8.0 might not work anyway (unproven, but several other newer algos definitely don't work due to the CUDA code was written without backward compatibility).

;  Valid CUDA Toolkit Map:
;   8.x for Fermi/Kepler       /Maxwell/Pascal,
;   9.0 for       Kepler       /Maxwell/Pascal/Volta(70),
;   9.1 for       Kepler       /Maxwell/Pascal/Volta(72),
;  10.x for       Kepler       /Maxwell/Pascal/Volta    /Turing,
;  11.x for       Kepler(35/37)/Maxwell/Pascal/Volta    /Turing/Ampere(80)
;  11.1 for       Kepler(35/37)/Maxwell/Pascal/Volta    /Turing/Ampere(86)
;  11.4 for       Kepler(35/37)/Maxwell/Pascal/Volta    /Turing/Ampere(87)

newer than 11.4 are the same as far as support so newest (11.6) should also work if you are building your own (releases only build up to 11.4 but that will work okay)

But you should match whatever the driver you have offers, use nvidia-smi command to show what's bundled (upper right corner) in case it is not recent (sometimes laptop drivers are weird or require older manufacturer-modified driver they never update).

avselang commented 2 years ago

I update the laptop driver to Version: 391.35(This is the latest driver support found on the official website),and use cuda9_1. An error occurred that thread #0 failed with error :112 "no kernel image is available for execution on the device".

Spudz76 commented 2 years ago

Prebuild may be filtering 35 out as it is fairly uncommon, to save build time and size.

Yep, verified the default build for 9.1 has CUDA_ARCH=30;50;60;70 and unlike other GPUs the base family 30 will not run on a 35 or 37 (Kepler 2.0), so you will have to build your own with -DCUDA_ARCH=35.

avselang commented 2 years ago

How is this done? Do you have a detailed tutorial?

Spudz76 commented 2 years ago

I built it for you, grab special release from here

avselang commented 2 years ago

Thank you very much, but the error still appeared：“thread #0 failed with error :112 "no kernel image is available for execution on the device"”.I have noticed some graphics card parameters and hope to provide you with reference： CUDA 9.1/9.1/6.17.0

NVML 8.376.54/391.35 press e for health report
CUDA GPU #0 01:00.0 GeForce 710M 1550/900 MHz smx:2 arch:21 mem:1679/2048 MB [2022-04-17 07:56:01.613] nvidia use profile astrobwt/v2 (1 thread) scratchpad 128 KB | # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME | 0 | 0 | 01:00.0 | 512 | 32 | 16 | 6 | 25 | 64 | GeForce 710M

Thank you again for your help

Spudz76 commented 2 years ago

That is very strange everything online says 710M is Kepler2.0. Clearly a Fermi (arch:21).

Perhaps online info is confused because there are GeForce 710M versus Geforce GT 710M and they think they are all the same but only the GT is the Kepler.

So when you run with the 8_0 plugin it never shows that summary? The mainstream release should have 20 in it which works on 21's.

avselang commented 2 years ago

when i run with the 8_0 plugin it shows that： nvidia use profile astrobwt/v2 (1 thread) scratchpad 128 KB | # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME | 0 | 0 | 01:00.0 | 512 | 32 | 16 | 6 | 25 | 64 | GeForce 710M [2022-04-17 19:06:18.324] nvidia thread #0 failed with error Unsupported algorithm [2022-04-17 19:06:18.375] nvidia thread #0 self-test failed [2022-04-17 19:06:18.376] nvidia disabled (failed to start threads)

Spudz76 commented 2 years ago

So that's with 'xmrig-cuda-6.17.0-cuda8_0-win64.zip'?

avselang commented 2 years ago

yes

Spudz76 commented 2 years ago

Okay I will see why the algorithm is not apparently being built and/or test it out on some of my Fermi (on Linux, though).

Spudz76 commented 2 years ago

Okay, same as previous AstroBWT (non-v2) algorithm, it uses the __shfl() call which is only supported on capability 3.0 or higher.

Investigating some sort of workaround, maybe it can work with a polyfill.

avselang commented 2 years ago

if i change the algo to cn-heavy/xhv,it may be all right use profile cn-heavy (1 thread) scratchpad 4096 KB | # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME | 0 | 0 | 01:00.0 | 160 | 40 | 4 | 8 | 25 | 640 | GeForce 710M [2022-04-18 13:13:03.432] nvidia READY threads 1/1 (322 ms) [2022-04-18 13:13:07.650] nvidia accepted (1/0) diff 1000 (109 ms) if i run with with the 9_1 plugin it shows that： use profile cn-heavy (1 thread) scratchpad 4096 KB | # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME | 0 | 0 | 01:00.0 | 160 | 40 | 4 | 8 | 25 | 640 | GeForce 710M [2022-04-18 13:14:18.846] nvidia READY threads 1/1 (302 ms) [2022-04-18 13:14:18.848] nvidia thread #0 failed with error :407 "no kernel image is available for execution on the device" The same is true when using xmrig-cuda-v6.17.0-arch35-cuda9_1-win64 hahaha，Let's all find out.

Spudz76 commented 2 years ago

Yes the 9_1 does not have anything but the erroneously assumed arch 35 so it will never work, now that we've confirmed it's the Fermi arch 21 type of 710M (and not the Kepler2 arch 35 "GT 710M"). You can just toss that one and not test it further.

Algorithms other than those which use the __shfl call should work fine with the 8_0 mainstream plugin, which are everything except RandomX, AstroBWT, and KawPow. However with 2GB VRAM there isn't room for RandomX or KawPow anyway so the only one that could work if patched is AstroBWT. I'm still investigating that.

avselang commented 2 years ago

thank you

xmrig / xmrig-cuda

The latest "Cuda 8.0"-plugin not working with "Dero HE (astrobwt/v2) "-algo. #161