gtcasl / gpuocelot

GPUOCelot: A dynamic compilation framework for PTX
http://gpuocelot.gatech.edu/
BSD 3-Clause "New" or "Revised" License
280 stars 69 forks source link

ret instruction for global entry kernel not supported in llvm bacend (cuda 4.1) #61

Closed jwang323 closed 9 years ago

jwang323 commented 9 years ago

From wangjin....@gmail.com on January 21, 2012 19:42:26

What steps will reproduce the problem? 1.Use cuda toolkit 4.1 2.Compile helloWorld.cu attached with nvcc, then link against libocelot 3.Set llvm as backend in configure.ocelot 4.Run helloWolrd What is the expected output? What do you see instead? Error infomration: helloWorld: ocelot/executive/implementation/LLVMCooperativeThreadArray.cpp:495: bool executive::LLVMCooperativeThreadArray::_finishContext(unsigned int): Assertion `nextFunction < _queuedThreads.size()' failed. Please use labels and text to provide additional information. This problem exists for all apps compiled with cuda toolkit 4.1. I traced the problem a little and found that cuda 4.1 use PTX instruction "ret" instead of "exit" (as cuda 4.0 or before does) in the exit point of the kernel with global entry. However, ocelot (llvm backend) does not detect this situation and simply assumes that "ret" means the subkernel returns to its caller. It fails when it tries to pop the stack to get the caller function which actually does not exist in this case.

Attachment: helloWorld.cu

Original issue: http://code.google.com/p/gpuocelot/issues/detail?id=62

jwang323 commented 9 years ago

From arkerr@gmail.com on January 31, 2012 10:14:17

Added guard conditionals to exit a thread when it returns with an empty stack.

Status: Fixed