Closed UsaidPro closed 5 months ago
Does warp-drive support chained CUDA kernels? Can I make every operation in my step a separate CUDA kernel if necessary and warp-drive will chain them together similar to CUDA Graphs?
Can I have CUDA functions with a different # of threads per block (aka different # of "agents" per environment) mixed within a step() without expecting a significant performance loss?
Would branch/loop operations like if/while run on GPU? I am not sure if the if/while operations are running within PyTorch GPU context or not.
Thank you so much for the quick response! I just found the Slack link, so I will use that for any future questions I have, sorry for creating this issue. Closing this issue.
Hello! I have a weird environment which I am having difficulty implementing in warp-drive. Essentially, the environment has N agents place M units on their own boards. After all agents are done placing units, then the boards are matched vs each other and intensive computations are performed to determine per-agent rewards.
I was thinking I could have a CUDA
Step
function with N agents (threads) per environment (1 block per env) which would handle overallstate
/action
. When the agents are done performing actions, then a CUDABoardStep
function with M units (threads) per board (1 block per board) would run being fed the mappedstate
->board_state
input (mapping would be done by a separate CUDA function). I essentially am attempting the below:I have implemented the
CudaBoardStep()
. I am not sure if Warp-Drive'sTrainer
can handle multipleCUDAFunctionManager
s with different threads/block and if this impacts Warp-Drive's performance. Looking at the example environments, I do not see a mixed-thread or chained CUDA kernels environment.Questions:
step()
without expecting a significant performance loss?if
/while
run on GPU? I am not sure if theif
/while
operations are running within PyTorch GPU context or not.