preda / gpuowl

GPU Mersenne primality test.
GNU General Public License v3.0
127 stars 35 forks source link

KERNEL_INVALID - Kriesel's mingw64 guide from mersenneforum.org for Windows Compile Version: "v7.2-91-g9c22195" #250

Closed nullcure closed 2 years ago

nullcure commented 2 years ago

#######################################################

Here is the compile output.

#######################################################

nullc@OPS71 MINGW64 /d/GIMPS/gpuowl $ make gpuowl-win.exe echo \"git describe --tags --long --dirty --always\" > version.new diff -q -N version.new version.inc >/dev/null || mv version.new version.inc echo Version: cat version.inc Version: "v7.2-91-g9c22195" g++ -MT ProofCache.o -MMD -MP -MF .d/ProofCache.Td -Wall -g -O3 -std=gnu++17 -c -o ProofCache.o ProofCache.cpp g++ -MT Proof.o -MMD -MP -MF .d/Proof.Td -Wall -g -O3 -std=gnu++17 -c -o Proof.o Proof.cpp g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -g -O3 -std=gnu++17 -c -o Pm1Plan.o Pm1Plan.cpp g++ -MT B1Accumulator.o -MMD -MP -MF .d/B1Accumulator.Td -Wall -g -O3 -std=gnu++17 -c -o B1Accumulator.o B1Accumulator.cpp g++ -MT Memlock.o -MMD -MP -MF .d/Memlock.Td -Wall -g -O3 -std=gnu++17 -c -o Memlock.o Memlock.cpp g++ -MT log.o -MMD -MP -MF .d/log.Td -Wall -g -O3 -std=gnu++17 -c -o log.o log.cpp g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -g -O3 -std=gnu++17 -c -o GmpUtil.o GmpUtil.cpp g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -g -O3 -std=gnu++17 -c -o Worktodo.o Worktodo.cpp g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -g -O3 -std=gnu++17 -c -o common.o common.cpp g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -g -O3 -std=gnu++17 -c -o main.o main.cpp g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -g -O3 -std=gnu++17 -c -o Gpu.o Gpu.cpp Gpu.cpp: In member function 'PRPResult Gpu::isPrimePRP(const Args&, const Task&)': Gpu.cpp:1823:29: warning: '(long long unsigned int)((char)&lastFailedRes64 + offsetof(std::optional,std::optional::.std::_Optional_base<long long unsigned int, true, true>::))' may be used uninitialized in this function [-Wmaybe-uninitialized] 1823 | if (lastFailedRes64 && (res == lastFailedRes64)) { | ~~~~^~~~~~~~ g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -g -O3 -std=gnu++17 -c -o clwrap.o clwrap.cpp g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -g -O3 -std=gnu++17 -c -o Task.o Task.cpp g++ -MT Saver.o -MMD -MP -MF .d/Saver.Td -Wall -g -O3 -std=gnu++17 -c -o Saver.o Saver.cpp g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -g -O3 -std=gnu++17 -c -o timeutil.o timeutil.cpp g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -g -O3 -std=gnu++17 -c -o Args.o Args.cpp g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -g -O3 -std=gnu++17 -c -o state.o state.cpp g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -g -O3 -std=gnu++17 -c -o Signal.o Signal.cpp g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -g -O3 -std=gnu++17 -c -o FFTConfig.o FFTConfig.cpp g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -g -O3 -std=gnu++17 -c -o AllocTrac.o AllocTrac.cpp g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -g -O3 -std=gnu++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp g++ -MT sha3.o -MMD -MP -MF .d/sha3.Td -Wall -g -O3 -std=gnu++17 -c -o sha3.o sha3.cpp g++ -MT md5.o -MMD -MP -MF .d/md5.Td -Wall -g -O3 -std=gnu++17 -c -o md5.o md5.cpp g++ -Wall -g -O3 -std=gnu++17 -o gpuowl-win.exe ProofCache.o Proof.o Pm1Plan.o B1Accumulator.o Memlock.o log.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o Saver.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o sha3.o md5.o -lstdc++fs -lOpenCL -lgmp -pthread -lquadmath -L/opt/rocm-4.0.0/opencl/lib -L/opt/rocm-3.3.0/opencl/lib/x86_64 -L/opt/rocm/opencl/lib -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L. -static strip gpuowl-win.exe

nullc@OPS71 MINGW64 /d/GIMPS/gpuowl

#########################################################

Here's what happens when I try to do a PRP by running the binary

#########################################################

nullc@OPS71 MINGW64 /d/GIMPS/gpuowl $ ./gpuowl-win.exe 20220316 17:38:00 GpuOwl VERSION v7.2-91-g9c22195 20220316 17:38:00 GpuOwl VERSION v7.2-91-g9c22195 20220316 17:38:00 config: -cpu OPS71-GPU_RTX2080 20220316 17:38:00 config: -user nullcure 20220316 17:38:00 config: 20220316 17:38:00 device 0, unique id '' 20220316 17:38:00 OPS71-GPU_RTX2080 113205811 FFT: 6M 1K:12:256 (17.99 bpw) 20220316 17:38:00 OPS71-GPU_RTX2080 113205811 OpenCL args "-DEXP=113205811u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.0044605685341225021 -DIWEIGHT_STEP=-0.0044407602188228376 -DIWEIGHTS={0,-0.0088618000863245963,-0.017645068671879212,-0.026350501687124144,-0.03497878889532309,-0.043530613947195582,-0.052006654435085155,-0.060407581946647464,} -DFWEIGHTS={0,0.0089410337398926083,0.017962009564123114,0.027063642237564821,0.036246652915827876,0.045511769202399077,0.054859725206292544,0.064291261600215852,} -cl-std=CL2.0 -cl-finite-math-only " 20220316 17:38:00 OPS71-GPU_RTX2080 113205811

20220316 17:38:00 OPS71-GPU_RTX2080 113205811 OpenCL compilation in 0.02 s 20220316 17:38:01 OPS71-GPU_RTX2080 Exception gpu_error: INVALID_KERNEL clSetKernelArg(k, pos, sizeof(value), &value) at clwrap.h:77 setArg 20220316 17:38:01 OPS71-GPU_RTX2080 Bye

nullc@OPS71 MINGW64 /d/GIMPS/gpuowl $

#######################################################

END DIAGNOSTIC OUTPUT

#######################################################

AMD APP SDK 3.0 was not available so I used the AMD OCL Light package andf copied lib and include to the following msys32/mingw64 directories.

I do have latest CUDA SDK installed. but I don't think that it was linked.

If anyone can help with this I'd be grateful excuse me if I come off as a n00b I kind of am when it comes to source code compile issues.

nullcure commented 2 years ago

Kriesel's Guide

https://www.mersenneforum.org/showpost.php?p=532454&postcount=21

nullcure commented 2 years ago

FIXED:

During the compile proccess.

gpuowl-exanded.cl is about 130kb in size but for some reason it gets cut down to 0kb.

so I opened the file up and kept the process open to keep from overwriting while compiling.

gpuowl-exanded.cl kept it's 130kb size at the end of the compile and gpuowl now works.

########################################################################### PowerShell 7.2.2 Copyright (c) Microsoft Corporation.

https://aka.ms/powershell Type 'help' to get help.

PS C:\Windows\System32> D: PS D:> cd .\GIMPS\gpuowl-v7.2-91-g9c22195\ PS D:\GIMPS\gpuowl-v7.2-91-g9c22195> .\gpuowl-win.exe 20220319 16:02:10 GpuOwl VERSION v7.2-91-g9c22195 20220319 16:02:10 GpuOwl VERSION v7.2-91-g9c22195 20220319 16:02:10 config: -cpu OPS71-GPU_RTX2080 20220319 16:02:10 config: -user nullcure 20220319 16:02:10 config: 20220319 16:02:10 device 0, unique id '' 20220319 16:02:10 OPS71-GPU_RTX2080 113205811 FFT: 6M 1K:12:256 (17.99 bpw) 20220319 16:02:11 OPS71-GPU_RTX2080 113205811 OpenCL args "-DEXP=113205811u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.0044605685341225021 -DIWEIGHT_STEP=-0.0044407602188228376 -DIWEIGHTS={0,-0.0088618000863245963,-0.017645068671879212,-0.026350501687124144,-0.03497878889532309,-0.043530613947195582,-0.052006654435085155,-0.060407581946647464,} -DFWEIGHTS={0,0.0089410337398926083,0.017962009564123114,0.027063642237564821,0.036246652915827876,0.045511769202399077,0.054859725206292544,0.064291261600215852,} -cl-std=CL2.0 -cl-finite-math-only " 20220319 16:02:11 OPS71-GPU_RTX2080 113205811

20220319 16:02:11 OPS71-GPU_RTX2080 113205811 OpenCL compilation in 0.28 s 20220319 16:02:12 OPS71-GPU_RTX2080 113205811 maxAlloc: 0.0 GB 20220319 16:02:12 OPS71-GPU_RTX2080 113205811 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h' 20220319 16:02:12 OPS71-GPU_RTX2080 113205811 P1(5.5M) 7935851 bits 20220319 16:02:14 OPS71-GPU_RTX2080 113205811 OK 61300000 on-load: blockSize 400, 9efdd88eca9d7073 20220319 16:02:14 OPS71-GPU_RTX2080 113205811 validating proof residues for power 8 20220319 16:02:29 OPS71-GPU_RTX2080 113205811 Proof using power 8 20220319 16:02:35 OPS71-GPU_RTX2080 113205811 OK 61300800 54.15% e527893c7927db03 4405 us/it + check 1.87s + save 0.30s; ETA 2d 15:30

############################################################################################

preda commented 2 years ago

Thanks, glad you could fix it!

I see you're running on Nvidia -- presumably there was a performance regression affecting Nvidia GPU that was just fixed (so you may want to try a fresh build).