RuntimeError: fail to allocate CUDA resources

Terizian commented 2 years ago

I've followed the tutorial in the README with regards to YOLOv3. Here is a list of the package versions:

pycuda 2019.1.2
onnx 1.4.1
tensorrt 8.0.1.6

I converted custom trained yolov3 models using the code in the yolo folder. I then try to load the model to use for inference as follows:

model1_trt = TrtYOLO(model1, (model1Height, model1Width), model1_category_num)

This is the output from my code:

[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +346, GPU +0, now: CPU 383, GPU 11577 (MiB)
[TensorRT] INFO: Loaded engine size: 122 MB
[TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine begin: CPU 506 MiB, GPU 11700 MiB
[TensorRT] WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +295, now: CPU 740, GPU 12111 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +307, GPU +395, now: CPU 1047, GPU 12506 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1047, GPU 12506 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine end: CPU 1047 MiB, GPU 12506 MiB
<tensorrt.tensorrt.ICudaEngine object at 0x7f911acae8>
[TensorRT] INFO: Loaded engine size: 0 MB
[TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine begin: CPU 917 MiB, GPU 12383 MiB
[TensorRT] ERROR: 3: Cannot deserialize with an empty memory buffer.
[TensorRT] ERROR: 4: [runtime.cpp::deserializeCudaEngine::76] Error Code 4: Internal Error (Engine deserialization failed.)
None
Traceback (most recent call last):
  File "/media/nvidia/XAVIER-1TB-SSD/0-PROD_CODE/utils/yolo_with_plugins.py", line 255, in __init__
    self.context = self.engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "trt_get_all_detections.py", line 81, in <module>
    car_trt = TrtYOLO(car_model, (carHeight, carWidth), car_category_num)
  File "/media/nvidia/XAVIER-1TB-SSD/0-PROD_CODE/utils/yolo_with_plugins.py", line 259, in __init__
    raise RuntimeError('fail to allocate CUDA resources') from e
RuntimeError: fail to allocate CUDA resources
Exception ignored in: <bound method TrtYOLO.__del__ of <utils.yolo_with_plugins.TrtYOLO object at 0x7f911acba8>>
Traceback (most recent call last):
  File "/media/nvidia/XAVIER-1TB-SSD/0-PROD_CODE/utils/yolo_with_plugins.py", line 266, in __del__
    del self.outputs
AttributeError: outputs

jkjung-avt commented 2 years ago

model1_trt = TrtYOLO(model1, (model1Height, model1Width), model1_category_num)

In my latest code, you don't need to specify "height" and "width" of input to the model. Those would be automatically determined by the code. Please pull the latest code from this repo and modify your code accordingly.

Source code for reference: https://github.com/jkjung-avt/tensorrt_demos/blob/67c4eb0f5e8a46656eb65073e5685dc7654498d8/utils/yolo_with_plugins.py#L275

Terizian commented 2 years ago

Thanks for pointing it out! It seems I was using the latest version of everything else except the inference code. I can confirm that it works after modifying the code to make trt_yolo.py.

Thanks again

jkjung-avt / tensorrt_demos

RuntimeError: fail to allocate CUDA resources #535