ret instruction for global entry kernel not supported in llvm bacend (cuda 4.1)

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1.Use cuda toolkit 4.1
2.Compile helloWorld.cu attached with nvcc, then link against libocelot
3.Set llvm as backend in configure.ocelot
4.Run helloWolrd

What is the expected output? What do you see instead?
Error infomration:
helloWorld: ocelot/executive/implementation/LLVMCooperativeThreadArray.cpp:495: 
bool executive::LLVMCooperativeThreadArray::_finishContext(unsigned int): 
Assertion `nextFunction < _queuedThreads.size()' failed.

Please use labels and text to provide additional information.
This problem exists for all apps compiled with cuda toolkit 4.1. I traced the 
problem a little and found that cuda 4.1 use PTX instruction "ret" instead of 
"exit" (as cuda 4.0 or before does) in the exit point of the kernel with 
__global__ entry. However, ocelot (llvm backend) does not detect this situation 
and simply assumes that "ret" means the subkernel returns to its caller. It 
fails when it tries to pop the stack to get the caller function which actually 
does not exist in this case.

Original issue reported on code.google.com by wangjin....@gmail.com on 22 Jan 2012 at 12:42

Attachments:

helloWorld.cu

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

Added guard conditionals to exit a thread when it returns with an empty stack.

Original comment by arkerr@gmail.com on 31 Jan 2012 at 6:14

Changed state: Fixed

maesfahani / gpuocelot

ret instruction for global entry kernel not supported in llvm bacend (cuda 4.1) #62