madMAx43v3r / chia-gigahorse

223 stars 31 forks source link

Cudaplot error with i9-13900k + RTX 4070Ti #57

Open chilio opened 1 year ago

chilio commented 1 year ago

System RAM is 64 GB.

When I try to run with RTX (GPU1) withs this command: cuda_plot_k32.exe -n -1 -x 11337 -C 7 -g 1 -t I:\ -3 I:\ -d H:\... I get this error: Invalid -r | --ndevices, not enough devices: 1

When I try to run without specifying -g parameter, then probably plotting is started on GPU on Intel chip, but system becomes totally unstable, so I am unable to tell what is actually going on (keyboard and mouse - no reaction, I can only see some storage activity).

I guess this is a bug connected with counting gpu devices, and probably does come up only on systems with additional gpu on cpu. In this case I am unable to plot with gpu at all.

Tested on: Win 11 full, latest updates. Chia k32 next-gen CUDA plotter mmx-v2.4 - 3e00fa3

thecybo commented 1 year ago

watch -n 2 nvidia-smi Run this to see the load on the Nvidia GPU before running cuda_plot_k32. If it's high, then it's using the right GPU and the system is slow because it's likely running low on RAM. It shouldn't even try the Intel as it's looking for CUDA devices only. Edit: just seen you're on Windows and the command I posted is for Linux, just get some GPU monitor up and running in this case.

chilio commented 1 year ago

@thecybo thanks for tip, but it is not the case -> result from nvidia-smi (windows also has one): image and from system default one: image

bladeuserpi commented 1 year ago

Try with -g 0 (or without -g) and -M 0.

madMAx43v3r commented 1 year ago

I guess this is a bug connected with counting gpu devices, and probably does come up only on systems with additional gpu on cpu.

First GPU is -g 0 and only Nvidia GPUs shown in nvidia-smi count. So in your case need to use -g 0 or leave it.

It's possible your SSD is super slow and is causing the system to freeze.

chilio commented 1 year ago

Actually that is the right problem here. Please take a look: image So the gpu 0 is Intel UHD Graphics 770, and gpu 1 is nvidia RTX and nvidia-smi reports nvidia as gpu 0, cause it is the first nvidia device... That would make sense with the error I got initally -> Invalid -r | --ndevices, not enough devices: 1. Regarding NVMe it is definitely not slow -> brand new WD SN850X 2TB running at full x4/4.

madMAx43v3r commented 1 year ago

Any chance you could try on Linux? Using ext4 that is.

chilio commented 1 year ago

Any chance you could try on Linux? Using ext4 that is.

Not this time, I need to have it running on windows.

Try with -g 0 (or without -g) and -M 0.

Now I've had a time to do some more testing, and thanks to @bladeuserpi suggestion... Without specifying -g parameter and restricting shared memory usage with -M to 30G shared vmem, I am able to plot. Plot times are not excellent, system is experiencing few minor lags, but at least plotting gets completed. So I guess there must have been something wrong with calculating the total available vram while running without -M parameter in case when there is gpu and igpu.

madMAx43v3r commented 1 year ago

The iGPU doesn't matter, CUDA totally ignores that.

Bigyurik commented 11 months ago

Ubuntu 22-04 RTX3060 Invalid -r | --ndevices, not enough devices: 0 Help