How to inference on multiple gpus?

spacewalk01 / tensorrt-yolov9

Cpp and python implementation of YOLOv9 using TensorRT API

https://github.com/WongKinYiu/yolov9

MIT License

112 stars 18 forks source link

How to inference on multiple gpus? #12

Closed fungtion closed 6 months ago

fungtion commented 6 months ago

Hi, can engine model perform inference on multiple gpus?

spacewalk01 commented 6 months ago

Hi,

From the FAQ: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#faq

Q: How do I use TensorRT on multiple GPUs?

A: Each ICudaEngine object is bound to a specific GPU when it is instantiated, either by the builder or on deserialization. To select the GPU, use cudaSetDevice() before calling the builder or deserializing the engine. Each IExecutionContext is bound to the same GPU as the engine from which it was created. When calling execute() or enqueue(), ensure that the thread is associated with the correct device by calling cudaSetDevice() if necessary.

from: https://github.com/NVIDIA/TensorRT/issues/322

spacewalk01 commented 6 months ago

Create an inference engine instance for each gpu and set cudaSetDevice(gpu_id) for each device.

fungtion commented 6 months ago

Can you give an example? I tried to set cudaSetDevice(1), but it always works on gpu:0 not gpu:1

spacewalk01 commented 6 months ago

Make sure to put it at the beginning of each function that is using cuda/gpu such as:

Yolov9::Yolov9(string engine_path)
{
cudaSetDevice(1);
// Read the engine file
ifstream engineStream(engine_path, ios::binary);
...

void Yolov9::predict(Mat& image, vector<Detection> &output)
{
cudaSetDevice(1);
// Preprocessing data on gpu
cuda_preprocess(image.ptr(), image.cols, image.rows, gpu_buffers[0], model_input_w, model_input_h, cuda_stream);
...

fungtion commented 6 months ago

Thanks, I will try it.