Open RomanKondratovich opened 2 years ago
This crash might be because I made a mistake, which only the RX580 compiler seems to tolerate. See: https://github.com/llvm/llvm-project/issues/53436#issuecomment-1071618469 for further details. I'm working on a fixed version.
Ok! i can test after fix :)
This should be fixed now on https://github.com/sudden6/m-queens/tree/gpu_recursive
Can you compile this on MSYS2 yourself? Otherwise I'll provide binaries in a few days.
make: *** No rule to make target '../../static_boinc/lib64/libboinc.a', needed by 'm-queens-boinc'. Stop. win 10 + MSYS2
I think I was overeager with this, I need to cleanup the build system and merge it with some other patches, sorry for the confusion.
Thanks sudden6 for you work! Now i can compile and test boinc-ocl to RX6600 XT and RTX3060. as I see an error somewhere in the generation, since 1n times a output file of 10M can be generated, and sometimes 2.5M, 1st time it was possible to generate 38M :) out1.zip out2.zip stderr.zip stderr_big.zip out_big.zip
@RomanKondratovich can you try and build the standalone m-queens program without boinc? It allows you to try to solve different board sizes stand alone which makes things more deterministic.
I checked the output you delivered, there doesn't seem to be any obvious problem. However I rebuilt my dev environment and found and fixed a bug that could affect you. I also added a new option to the m-queens2 program, so that you can enable and disable the debug printf from the OpenCL kernel at run time. So maybe you give it another try. I'm also experiencing problems for board sizes >= 13, without debug printf enabled, so maybe you can do a run from with N=11,12,13,14 and we can compare the outputs.
Yess!! :) works great standalone and ocl version! now stderr: Platform name: AMD Accelerated Parallel Processing Platform version: OpenCL 2.1 AMD-APP (3354.13) Device: gfx1032 Device version: OpenCL 2.0 AMD-APP (3354.13) OpenCL 2.x supported Device memory: 8176MB Allocation limit: 6732MB
Aligning to SUM_REDUCTION_FACTOR: 20971520 OPTIONS: -cl-std=CL2.0 -DBOARDSIZE=27 -DGPU_DEPTH=11 -DWORKSPACE_SIZE=20971520 -DWORKGROUP_SIZE=64 -DSUM_REDUCTION_FACTOR=32768 -DDEBUG_PRINT=0 OpenCL build log:
OCL Kernel memory: 2560MB Threads: 1 Starting cleanup Launching reduction sum kernels, count: 640, workspace_size: 20971520, SUM_REDUCTION_FACTOR: 32768 Starting cleanup Launching reduction sum kernels, count: 640, workspace_size: 20971520, SUM_REDUCTION_FACTOR: 32768 Starting cleanup Launching reduction sum kernels, count: 640, workspace_size: 20971520, SUM_REDUCTION_FACTOR: 32768 Starting cleanup Launching reduction sum kernels, count: 640, workspace_size: 20971520, SUM_REDUCTION_FACTOR: 32768 Starting cleanup Launching reduction sum kernels, count: 640, workspace_size: 20971520, SUM_REDUCTION_FACTOR: 32768 Starting cleanup
Nvidia test: m-queens2.exe -s 11 -m ocl -p 0 -d 0 Platform name: NVIDIA CUDA Platform version: OpenCL 3.0 CUDA 11.4.243 Device: NVIDIA GeForce RTX 3060 Device version: OpenCL 3.0 CUDA OpenCL 3.x supported Device memory: 12288MB Allocation limit: 3072MB
Allocation limit reached, truncating to: 26843545 Aligning to SUM_REDUCTION_FACTOR: 26836992 OPTIONS: -cl-std=CL2.0 -DBOARDSIZE=11 -DGPU_DEPTH=8 -DWORKSPACE_SIZE=26836992 -D WORKGROUP_SIZE=64 -DSUM_REDUCTION_FACTOR=32768 -DDEBUG_PRINT=0 OpenCL build log:
Hmmm, you can probably workaround the Nvidia problem, by manually commenting out all printf
lines, maybe I find a workaround for that^^
Jep, but next error: m-queens2.exe -s 12 works great. m-queens2.exe -s 13: Platform name: NVIDIA CUDA Platform version: OpenCL 3.0 CUDA 11.4.243 Device: NVIDIA GeForce RTX 3060 Device version: OpenCL 3.0 CUDA OpenCL 3.x supported Device memory: 12288MB Allocation limit: 3072MB
Allocation limit reached, truncating to: 26843545 Aligning to SUM_REDUCTION_FACTOR: 26836992 OPTIONS: -cl-std=CL2.0 -DBOARDSIZE=13 -DGPU_DEPTH=10 -DWORKSPACE_SIZE=26836992 -DWORKGROUP_SIZE=64 -DSUM_REDUCTION_FACTOR=32768 -DDEBUG_PRINT=0 OpenCL build log: ptxas error : Entry function 'relaunch_kernel' uses too much shared data (0xc038 bytes, 0xc000 max) ptxas error : Entry function '__kernel___relaunch_kernel_block_invoke$34' uses too much shared data (0xc038 bytes, 0xc000 max)
Unfortunately I don't think that's something I can fix :(
I guess there's better luck with Intel and AMD. Is m-queens2 now fully working on your RX6600? no errors for all board sizes?
Yes, AMD RX 6600 XT works great. Tasks completed 90, estimated time ~ 1h/boinc task
1h/boinc task seems quite long to me, my old RX580 does one task in ~10-20min, but I optimized for that card, soo....
Are the tasks confirmed by the server to return the correct results?
Can you share a screenshot of GPU-Z with the sensors page while running a task?
Yes all task is confirmed. Gpu-z is attached.
Thank you for the info!
It seems something is limiting the performance, because your card only draws 53W, but the GPU shows maximum load. Might need some fine tuning, maybe the workgroup size is not optimal for RDNA2 cards.
@RomanKondratovich You could try and set this constant: https://github.com/sudden6/m-queens/blob/gpu_recursive/clsolver.cpp#L31 to 32
, I think that could help on your GPU.
Changed WORKGROUP_SIZE from 64 to 32: Allocation limit: 6732MB
Aligning to SUM_REDUCTION_FACTOR: 20971520 OPTIONS: -cl-std=CL2.0 -DBOARDSIZE=27 -DGPU_DEPTH=11 -DWORKSPACE_SIZE=20971520 -DWORKGROUP_SIZE=32 -DSUM_REDUCTION_FACTOR=32768 -DDEBUG_PRINT=0 OpenCL build log:
OCL Kernel memory: 2560MB Threads: 1
Unhandled Exception Detected...
huh, that's not supposed to happen, need to investigate...
i can share this pc to you over teamviewer or anydesk.
Thank you for the offer, but developing via TeamViewer or similar will probably not be efficient, as GPU hangs tend to crash the whole system... Also I don't really have time for a long development session :(
I'll try reproducing this on my systems first lets see what if it's reproducible first.
@RomanKondratovich I finally found some time to look into this further, I found some race conditions in the OpenCL code, maybe that fixes it?
Great news! Great news! can you the build exe file?
I'll do as soon as possible, might be one or two days though.
@RomanKondratovich a release is published, please tell me if you detect any regressions.
Hello, how about RX 6600 XT?
error log: