What steps will reproduce the problem? 1.Use cuda toolkit 4.1
2.Compile helloWorld.cu attached with nvcc, then link against libocelot
3.Set llvm as backend in configure.ocelot
4.Run helloWolrd What is the expected output? What do you see instead? Error infomration:
helloWorld: ocelot/executive/implementation/LLVMCooperativeThreadArray.cpp:495: bool executive::LLVMCooperativeThreadArray::_finishContext(unsigned int): Assertion `nextFunction < _queuedThreads.size()' failed. Please use labels and text to provide additional information. This problem exists for all apps compiled with cuda toolkit 4.1. I traced the problem a little and found that cuda 4.1 use PTX instruction "ret" instead of "exit" (as cuda 4.0 or before does) in the exit point of the kernel with global entry. However, ocelot (llvm backend) does not detect this situation and simply assumes that "ret" means the subkernel returns to its caller. It fails when it tries to pop the stack to get the caller function which actually does not exist in this case.
From wangjin....@gmail.com on January 21, 2012 19:42:26
What steps will reproduce the problem? 1.Use cuda toolkit 4.1 2.Compile helloWorld.cu attached with nvcc, then link against libocelot 3.Set llvm as backend in configure.ocelot 4.Run helloWolrd What is the expected output? What do you see instead? Error infomration: helloWorld: ocelot/executive/implementation/LLVMCooperativeThreadArray.cpp:495: bool executive::LLVMCooperativeThreadArray::_finishContext(unsigned int): Assertion `nextFunction < _queuedThreads.size()' failed. Please use labels and text to provide additional information. This problem exists for all apps compiled with cuda toolkit 4.1. I traced the problem a little and found that cuda 4.1 use PTX instruction "ret" instead of "exit" (as cuda 4.0 or before does) in the exit point of the kernel with global entry. However, ocelot (llvm backend) does not detect this situation and simply assumes that "ret" means the subkernel returns to its caller. It fails when it tries to pop the stack to get the caller function which actually does not exist in this case.
Attachment: helloWorld.cu
Original issue: http://code.google.com/p/gpuocelot/issues/detail?id=62