Thx for your great work!
I found the speed of post-process is too slow, and the bottleneck is torch::masked_select() function ,as the picture shows.
And then I set the environment variable CUDA_LAUNCH_BLOCKING=1 as #3, I found the speed of inference is too slow.
So would you like to give me any advice about solving this problem? Thank you very much!
Thx for your great work! I found the speed of post-process is too slow, and the bottleneck is
torch::masked_select()
function ,as the picture shows. And then I set the environment variableCUDA_LAUNCH_BLOCKING=1
as #3, I found the speed of inference is too slow. So would you like to give me any advice about solving this problem? Thank you very much!