zeratax / yacx

Yet Another CudaExecutor - wrapper to easily compile and execute cuda kernels
https://zeratax.github.io/yacx
MIT License
8 stars 4 forks source link

support streams #46

Open zeratax opened 4 years ago

zeratax commented 4 years ago

streams to execute kernels in async. Device should probably take care of context?

zeratax commented 4 years ago

you should consider #47, since both are async.

Device device;
CudaStream Sone, Stwo, Sthree;

CudaStream[] streams= {Sone, Stwo, Sthree};
Kernel[] kernels = {kernel1, kernel2, kernel3};

for(size_t i{0}; i < 3; ++i} {
   kernels[i].queueupload(args, device, streams[i]);
   kernels[i].queuelaunch(args, device, streams[i]); // i guess args could be implicitly known here?
   kernels[i].queuedownload(device, streams[i]);
}
// nonblocking, cpu can still execute while gpu is busy (but not download and upload??)
kernel.sync() // blocking, gpu done after this

execution order

this should be equivalent to async version 1.

I'm not sure how much upload and download need device. we need to be more explicit about context for this.

more info: https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/

LukasSiefke commented 4 years ago

Ich habe auf dem Branch mal ein bisschen angefangen mit Streams. So wird halt alles auf einem einzigen Stream asynchron ausgeführt. Aber irgendwie werden die upload-operationen und download-operationen trotzdem synchron durchgeführt (weiß irgendwie nicht wieso), weswegen das dann nicht wirklich viel bringt