madMAx43v3r / chia-gigahorse

221 stars 32 forks source link

Error encountered when using Tesla P100 #122

Closed Meeea-914 closed 1 year ago

Meeea-914 commented 1 year ago

When I use tesla p100 (Driver Version: 525.105.17 CUDA Version: 12.0), everytime the plotting progress comes to P4, the program collapsed and print 'Floating point exception (core dumped)'. Whether it is k32 or k33, whether it is c1 or c8.

madMAx43v3r commented 1 year ago

This usually happens because I didn't include a binary for that GPU. And indeed P100 is a special case with compute capability 6.0, which I didn't include..

madMAx43v3r commented 1 year ago

I've included support for P100 in latest linux binaries now

Block-Captain commented 1 year ago

Thanks for the reply, I Got same issue, all tables are 0 entries, and very short plotting time for each table, Nvidia P100 has HBM2 which shall be different from most GPUs, hope to fix it soon. Maybe other GPUs with HBM2, HBM2e, and HBM3 have the same issue?

My Specs: Ubuntu Desktop 22.04.2 CPU: EPYC 7302 Motherboard: H11SSL Memory: 128G DDR4 2400 Nvidia Driver: cuda-drivers (530, the newest)

My log: P100_gigahorse_log_0_entries.txt

madMAx43v3r commented 1 year ago

yes this happens when there is no kernels for your GPU in the binary. Are you sure you are using the latest version?

check git log, it should show linux P100 plotting support at the top.

madMAx43v3r commented 1 year ago

based on your log, you're not running the latest version, it should show this: Chia k32 next-gen CUDA plotter mmx-v2.4 - 62a4a8d

run git pull inside chia-gigahorse to update

Block-Captain commented 1 year ago

Yeah, thank you. It works already, the 2nd plot and the after ones are faster, attached is the log. You may close this issue. P100_plotting_done_log.txt

madMAx43v3r commented 1 year ago

the 2nd plot and the after ones are faster

yes that's normal because of memory allocation on the first plot

kofttlcc commented 1 year ago

When does the gentleman plan to release the k32 file that supports the Tesla P100 GPU card under the Windows version?

madMAx43v3r commented 1 year ago

it should work with latest version?

madMAx43v3r commented 1 year ago

oh I see, maybe not, I need to update windows build yes

kofttlcc commented 1 year ago

The latest cuda_plot_k32 on Linux can plot normally, but I have two P100 cards and urgently need to be able to plot on Windows as well. Using the latest files on Windows also results in the above exit issue. Thanks!

madMAx43v3r commented 1 year ago

ok try latest version now

kofttlcc commented 1 year ago

Good Good