preda / gpuowl

GPU Mersenne primality test.
GNU General Public License v3.0
127 stars 35 forks source link

Fixed v6 branch for CUDA 12.0 Nvidia driver #267

Open tdulcet opened 1 year ago

tdulcet commented 1 year ago

Note that this was my attempt to fix the v6 branch, but it does NOT actually fix it:

2023-03-11 11:30:34 gpuowl v6.11-384-gb51191f-dirty
2023-03-11 11:30:34 Note: not found 'config.txt'
2023-03-11 11:30:34 config: -prp 106928347 -iters 100000 -device 0 -cleanup -log 10000 -maxAlloc 13590M 
2023-03-11 11:30:34 device 0, unique id ''
2023-03-11 11:30:35 Tesla T4-0 106928347 FFT: 6M 1K:12:256 (17.00 bpw)
2023-03-11 11:30:35 Tesla T4-0 Expected maximum carry32: 24DB0000
2023-03-11 11:30:36 Tesla T4-0 OpenCL args "-DEXP=106928347u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DWEIGHT_STEP_MINUS_1=0x1.7ddbbaacae2cep-9 -DIWEIGHT_STEP_MINUS_1=-0x1.7cbfc2938b93dp-9  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
1 warning generated.
2023-03-11 11:30:39 Tesla T4-0 

2023-03-11 11:30:39 Tesla T4-0 OpenCL compilation in 3.29 s
2023-03-11 11:30:42 Tesla T4-0 106928347 OK        0 loaded: blockSize 400, 0000000000000003
2023-03-11 11:30:42 Tesla T4-0 validating proof residues for power 8
2023-03-11 11:30:42 Tesla T4-0 Proof using power 8
2023-03-11 11:30:49 Tesla T4-0 106928347 EE      800   0.00%; 5864 us/it; ETA 7d 06:10; e539a11374d52057 (check 2.49s)
2023-03-11 11:30:52 Tesla T4-0 106928347 OK        0 loaded: blockSize 400, 0000000000000003
2023-03-11 11:30:59 Tesla T4-0 106928347 EE      800   0.00%; 5953 us/it; ETA 7d 08:48; e539a11374d52057 (check 2.55s) 1 errors
2023-03-11 11:31:02 Tesla T4-0 106928347 OK        0 loaded: blockSize 400, 0000000000000003
2023-03-11 11:31:09 Tesla T4-0 106928347 EE      800   0.00%; 6035 us/it; ETA 7d 11:15; e539a11374d52057 (check 2.57s) 2 errors
2023-03-11 11:31:09 Tesla T4-0 3 sequential errors, will stop.
2023-03-11 11:31:09 Tesla T4-0 Exiting because "too many errors"
2023-03-11 11:31:09 Tesla T4-0 Bye

I am not an OpenCL programmer, so I obviously did not correctly resolve the merge conflicts. Any help to finish this PR by correctly applying https://github.com/preda/gpuowl/commit/677f43a2ef299f0b8cc9885284fbaa086e917ce2 to the v6 branch would be greatly appreciated by Colab users and likely other people with Nvidia GPUs. Thanks in advance.

preda commented 7 months ago

Is this still actual/useful for merging?

tdulcet commented 7 months ago

The main change in this PR of fixing the v6 branch on Nvidia GPUs may still be useful, if you or someone with OpenCL experience were able to finish it. However, OpenCL is completely busted with the latest Nvidia driver, so I am unable to test anything to confirm if it fixes the issue. If/When they do fix their driver, they could fix the original issue as well, eliminating the need to make this change to the v6 branch. We are still patiently waiting to see what Nvidia does...

The other minor changes in the PR, notably fixing Clang support and enabling LTO, are still very useful and should be made to the master branch as well. I was planning to make a separate PR after this was merged.

preda commented 1 week ago

It looks like there are too many unrelated changes in this PR; I'm not inclined to merge it as-is.

Maybe some small individual fixes can be extracted as separate PR.

tdulcet commented 1 week ago

Yeah, as explained above, this PR is unfinished and does not currently work, which is why it is marked a draft. It has not been a priority to finish either due to OpenCL still being busted with recent Nvidia drivers on Linux.

I could remove the 3d073e09961cedefeb397484f43ab863ac37e824 commit if you were interested in merging the other fixes. Otherwise, I suppose this PR could be closed while we wait for Nvidia...