What steps will reproduce the problem?
1.Use cuda toolkit 4.1
2.Compile helloWorld.cu attached with nvcc, then link against libocelot
3.Set llvm as backend in configure.ocelot
4.Run helloWolrd
What is the expected output? What do you see instead?
Error infomration:
helloWorld: ocelot/executive/implementation/LLVMCooperativeThreadArray.cpp:495:
bool executive::LLVMCooperativeThreadArray::_finishContext(unsigned int):
Assertion `nextFunction < _queuedThreads.size()' failed.
Please use labels and text to provide additional information.
This problem exists for all apps compiled with cuda toolkit 4.1. I traced the
problem a little and found that cuda 4.1 use PTX instruction "ret" instead of
"exit" (as cuda 4.0 or before does) in the exit point of the kernel with
__global__ entry. However, ocelot (llvm backend) does not detect this situation
and simply assumes that "ret" means the subkernel returns to its caller. It
fails when it tries to pop the stack to get the caller function which actually
does not exist in this case.
Original issue reported on code.google.com by wangjin....@gmail.com on 22 Jan 2012 at 12:42
Original issue reported on code.google.com by
wangjin....@gmail.com
on 22 Jan 2012 at 12:42Attachments: