madMAx43v3r / chia-gigahorse

220 stars 32 forks source link

Windows, 2 GPU, 2nd GPU Causing Crash if Memory Pinned Above 32GB on 128GB System #290

Open x86txt opened 3 months ago

x86txt commented 3 months ago

Hi Max - I have a system running Windows 11 with 2 GPUs, a 4070 Ti Super 16GB and a 3080 10GB, 128GB of DDR4-3200 RAM, Ryzen 5950X, multiple NVMe. I'm using cuda_plot_k32.exe v2b9c13f and have tried the previous version as well.

I can plot perfectly fine on my primary GPU using a partial RAM C16. About 4 mins with the following command line:

Primary GPU successfully: cuda_plot_k32.exe -C 16 -n -1 -g 0 -r 1 -M 64 -S 2 -t C:\chia\tmp1\ -2 D:\tmp3\ -d Z:\plots\mmax\ -c key1 -f key2

However, when I try to plot on my 2nd GPU, cuda_plot_k32.exe (latest version 2b9c13f and previous version) will crash after [P1] Setup took 0.175 sec cuda memory allocation error, which I've included below.

If I reduce the plotter to only being able to pin 32GB of memory, the plotter will continue successfully.

Here is how I'm calling the plotter:

Unsuccessfully: cuda_plot_k32.exe -C 16 -n -1 -g 1 -r 1 -M 64 -S 2 -t C:\chia\tmp1\ -2 D:\tmp3\ -d Z:\plots\mmax\ -c key1 -f key2 Successfully: cuda_plot_k32.exe -C 16 -n -1 -g 1 -r 1 -M 32 -S 2 -t C:\chia\tmp1\ -2 D:\tmp3\ -d Z:\plots\mmax\ -c key1 -f key2

Any value above 32GB results in the plotter crashing, even 33GB.

Failure:

Chia k32 next-gen CUDA plotter - 2b9c13f Plot Format: mmx-v2.5 Network Port: 8444 [chia] No. GPUs: 1 No. Streams: 2 Direct IO: No Final Destination: Z:\plots\mmax\ Bucket Chunk Size: 8 MiB Max Pinned Memory: 58.8047 GiB Number of Plots: infinite Initialization took 0.267 sec Crafting plot 1 out of -1 (2024/03/23 08:41:27) Process ID: 22040 Pool Puzzle Hash: (removed) Farmer Public Key: (removed) Working Directory: C:\chia\tmp1\ Working Directory 2: D:\tmp3\ Compression Level: C16 Plot Name: plot-k32-c16-2024-03-23-08-41-79fcc55bd0d8ea939f76fda197276a7edb812c92cd2ac398b9dd6e6f9a6e6f8f Created disk buffer D:\tmp3\cuda_plot_tmp2_1711197687622228.tmp [P1] Setup took 0.181 sec T1 download thread failed with: failed to allocate 8388608 bytes of MEM_TYPE_PINNED: CUDA error 2: out of memory

Success @ 32GB RAM pinned:

Chia k32 next-gen CUDA plotter - 2b9c13f Plot Format: mmx-v2.5 Network Port: 8444 [chia] No. GPUs: 1 No. Streams: 2 Direct IO: No Final Destination: Z:\plots\mmax\ Bucket Chunk Size: 8 MiB Max Pinned Memory: 28.5 GiB Number of Plots: infinite Initialization took 0.188 sec Crafting plot 1 out of -1 (2024/03/23 08:43:28) Process ID: 19776 Pool Puzzle Hash: (removed) Farmer Public Key: (removed) Working Directory: C:\chia\tmp1\ Working Directory 2: D:\tmp3\ Compression Level: C16 Plot Name: plot-k32-c16-2024-03-23-08-43-d8c096f97491703b993da7f9f25d7d28182b09758ca182d8c761f23dbf411263 Created disk buffer D:\tmp3\cuda_plot_tmp2_1711197808276745.tmp [P1] Setup took 0.167 sec [P1] Table 1 took 10.903 sec, 4294967296 entries, 16788437 max, 66654 tmp, 0 GB/s up, 3.02671 GB/s down [P1] Table 2 took 14.82 sec, 4294871603 entries, 16789752 max, 66620 tmp, 2.15924 GB/s up, 3.61844 GB/s down

This is not a super big deal, because even at 32GB plots complete in 6 mins on the 3080, versus 4 mins on my 4070TiSuper with 64GB pinned, but I thought I should report it regardless.

madMAx43v3r commented 3 months ago

It's probably some limitation in Windows.

spleen911 commented 3 months ago

Do you have 2 CPU too?

I have same CUDA error my DELL T7910 (512GB) when plotting with GPU1. One of these days I'll have time to play with NUMA settings to see if I can get 256GB pinned to GPU1. Until then, I plot with GPU0 and farm with GPU1 as a workaround.

madMAx43v3r commented 3 months ago

I'm just testing with 2 GPUs on Windows, only -M 0 works.