madMAx43v3r / chia-gigahorse

220 stars 32 forks source link

Nvidia H100 floating-point exception #159

Open Kali123411 opened 1 year ago

Kali123411 commented 1 year ago

While watching Digital Spaceports YT video he was attempting to plot utilizing the Nvidia H100 but would get the floating-point exception (core dump) error every time. Will you add support for A100 / H100 ?

digitalspaceport commented 1 year ago

Instance was LamdaAPI gpu_1x_h100_pcie ran against latest binary

./cuda_plot_k32 -n 2 -C 8 -t /mnt/plot/ -d /mnt/plot/ -f ac70e43c24f2526c04dcbbdb8f77c50db8ee6f602b694fdd82e52189c33bcdc7fbefc1bc9c7fe9887f6253cc328d63e9 -c xch1qc63hkuyw8kgwhxh82jh3kxpgjag0nh2m0fwjzy4cpu62dde0acs9kqa6e Chia k32 next-gen CUDA plotter - 54321cd Plot Format: mmx-v2.4 Network Port: 8444 [chia] No. GPUs: 1 No. Streams: 4 Final Destination: /mnt/plot/ Shared Memory limit: unlimited Number of Plots: 2 Initialization took 0.205 sec Crafting plot 1 out of 2 (2023/06/19 19:55:36) Process ID: 54256 Pool Puzzle Hash: 06351bdb8471ec875cd73aa578d8c144ba87ceeadbd2e90895c079a535b97f71 Farmer Public Key: ac70e43c24f2526c04dcbbdb8f77c50db8ee6f602b694fdd82e52189c33bcdc7fbefc1bc9c7fe9887f6253cc328d63e9 Working Directory: /mnt/plot/ Working Directory 2: @RAM Compression Level: C8 (xbits = 8, final table = 4) Plot Name: plot-k32-c8-2023-06-19-19-55-93cab48b7c8699a94c2ca6729e27ba58ea6dc659dacc8e7d5858996260a5850f [P1] Setup took 0.799 sec [P1] Table 1 took 0.823 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.3126 GB/s down [P1] Table 2 took 1.235 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.2957 GB/s down [P1] Table 3 took 2.052 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.4231 GB/s down [P1] Table 4 took 2.87 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.4635 GB/s down [P1] Table 5 took 2.462 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.4298 GB/s down [P1] Table 6 took 2.053 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.4029 GB/s down [P1] Table 7 took 1.132 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.2988 GB/s down Phase 1 took 13.8 sec [P2] Setup took 0.236 sec [P2] Table 7 took 0.005 sec, 0 GB/s up, 106.25 GB/s down [P2] Table 6 took 0.344 sec, 0 GB/s up, 1.54433 GB/s down [P2] Table 5 took 0.348 sec, 0 GB/s up, 1.52658 GB/s down Phase 2 took 1.285 sec [P3] Setup took 0.547 sec [P3] Table 4 LPSK took 1.279 sec, 0 entries, 0 max, 0 tmp, 0.415364 GB/s up, 39.8751 GB/s down [P3] Table 4 NSK took 1.463 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 40.6939 GB/s down [P3] Table 5 PDSK took 1.175 sec, 0 entries, 0 max, 0 tmp, 0.452128 GB/s up, 39.7874 GB/s down [P3] Table 5 LPSK took 1.234 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.3292 GB/s down [P3] Table 5 NSK took 1.462 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 40.7217 GB/s down [P3] Table 6 PDSK took 1.175 sec, 0 entries, 0 max, 0 tmp, 0.452128 GB/s up, 39.7874 GB/s down [P3] Table 6 LPSK took 1.234 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.3292 GB/s down [P3] Table 6 NSK took 1.444 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.2293 GB/s down [P3] Table 7 PDSK took 1.132 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.2988 GB/s down [P3] Table 7 LPSK took 1.233 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.3627 GB/s down [P3] Table 7 NSK took 1.444 sec, 0 entries, 0 max, 0 tmp, 0 GB/s up, 41.2293 GB/s down Phase 3 took 15.063 sec [P4] Setup took 0.128 sec [P4] total_p7_parks = 0 Floating point exception (core dumped)

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

nvidia-smi

NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0

dmesg

traps: cuda_plot_k32[54256] trap divide error ip:564954e4ee84 sp:7ffd612e5650 error:0 in cuda_plot_k32[564954e04000+1a6000]

CUDA_FORCE_PTX_JIT=1 tested but didn't change run outcome

madMAx43v3r commented 1 year ago

yeah H100 support is not included (yet)

madMAx43v3r commented 1 year ago

A100 is supported