Open Kalkasas opened 7 years ago
Hmm, 218 is an invalid ptx error. I have added a new level of verbosity in the latest branch: if you pull and recompile, and set your Opt_InitializationParameters.verbosityLevel to 3 it should dump out a lot of extra information. If you put the output here I might be able to take a guess at what is going wrong.
After pulling the latest branch I realized that I fiddled with the opt code in order to make it run. This might be the reason for this error.
Running the minimal_graph_only test with single precision led to:
src/tcuda.cpp:139: nvvm error reported (7)
libnvvm : error: -arch=compute_61 is an unsupported option
stack traceback:
[C]: in function 'toptximpl'
src/cudalib.lua:84: in function 'toptx'
[string "<string>"]:238: in function 'cudacompile'
/home/dl/projects/Optlang/Opt/API/src/util.t:873: in function 'makeGPUFunctions'
...e/dl/projects/Optlang/Opt/API/src/solverGPUGaussNewton.t:751: in function 'compilePlan'
/home/dl/projects/Optlang/Opt/API/src/o.t:870: in function </home/dl/projects/Optlang/Opt/API/src/o.t:862>
[C]: in function 'xpcall'
/home/dl/projects/Optlang/Opt/API/src/o.t:862: in function </home/dl/projects/Optlang/Opt/API/src/o.t:861>
0 terra (JIT) 0x00007f1fb163200d $opt.ProblemSolve + 13
./dense() [0x4275bc]
Segmentation fault
I modified the cudacompille call in util.t
local kernels = terralib.cudacompile(kernelFunctions, verbosePTX,30)
Now single precision works for all examples. Enabling double precision leads to the error 218.
The output is attached to this post. /usr/local/cuda points to a cuda 7.5 installation. out.txt
I believe this is some poor interaction between atomic double precision float adds and your version of nvvm. One workaround for now is to set pascalOrBetterGPU = false on line 127 of util.t, but this will make double precision atomics slow.
The other, better, workaround, is to switch to CUDA 8.0, like in #99 . Though I don't know what errors you might run into there.
At the moment I am using the terra binaries mentioned in the readme file. I assume I have to change the version as well when I am compiling with CUDA 8.0? I think I am going to stick with single precision for the time being.
When enabling Double Precision I get:
What can I do to fix this problem?