Open ghost opened 3 years ago
What I do know is this problem can be solved with isolating tensorRT from kmeans_cuda
.
Here's how I've hackily fixed it:
I simply run the tensorRT inference (with all it's pagelocked allocation, engine, stream, context, etc.) in one thread and then run kmeans_cuda
in a separate thread. A thread-safe queue is used to pass the inference results through to the other thread that runs kmeans. There - isolation! No more errors.
But I have no idea why this works, and it feels extremely hacky. Are devs willing to comment on best practices and caveats to running kmeans_cuda
synchronously with other calls to the GPU (using tensorRT or otherwise)?
I also encountered the same problem, but I loaded two trt models at the same time. My method is: first, mapping the torch2trt inc and lib paths to the include and lib paths corresponding to tensorrt(e.g, TensorRT 8.2.3,TensorRT 7.1); then, separating the two trt models initializing with two classes,respectively. Finally, I use one class to use two model class respectively . Note: when you execute every forward or call,you need add the code(torch.cuda.set_device('cuda:0')). My problem is solved, and the stress test also passed. My method is succeed for these environments(TensorRT 7.1.2 and torch2trt 0.3.0, TensorRT 8.2.3 and torch2trt-0.4.0).
I think these is a resource and GPU contention issue.
How can I run kmcuda synchronously after a tensorRT model performs inference on the same GPU (in a loop)?
For instance, I already am allocating pagelocked buffers for my tensorRT model, but I don't explicitly allocate anything upfront for
kmeans_cuda
to run on. Doesn't that mean there might be a conflict if both processes are accessing the GPU and don't totally "cleanup" after themselves?The error I get the next time tensorRT runs (only after kmcuda runs):
So I guess in general my question is how should/can I cleanup after kmcuda runs? The reason I think some how preallocating buffers would help is because a very similar SO issue reported that as the solution (for tensorflow and tensorRT on the same GPU)
Environment: