Open GoogleCodeExporter opened 9 years ago
For object linking, currently NVCC has only announced support for CUBIN
linking. They will also support PTX linking in the future. At that time,
Ocelot should support it with only minor changes, so I plan to wait for that
feature. In the meantime, it is possible to 'link' PTX files together by
simply concatenating them.
Dynamic parallelism should be supported by default on the NVIDIA devices since
device code contains the kernel launch and interacts directly with the GPU
driver.
There is experimental support for asynchronous dynamic parallelism in the
emulator and LLVM backend via a user-level library that simply calls cudaLaunch
from a different user-pthread. We plan to move this functionality into the
CUDA runtime. We also need support for synchronous dynamic parallelism, which
should be relatively easy, but still needs some implementation work. Of course
both of these need unit tests.
I'm not sure what the status is on the AMD backend.
Original comment by SolusStu...@gmail.com
on 30 May 2012 at 3:57
Original issue reported on code.google.com by
rtf...@gmail.com
on 17 May 2012 at 1:59