Author, do you have a complete Python version that reads the engine model of Tensorrt to infer strength segmentation code, which is a simple version of the official inference code. It can be run in just one file without calling too many Python files or libraries

yxl23 commented 4 months ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Additional

No response

glenn-jocher commented 4 months ago

Hello,

Thank you for reaching out with your query. Currently, we don't have a single-file Python script specifically for running inference with a TensorRT .engine model that minimizes external dependencies. However, you can achieve this by using the tensorrt library in Python to load and run inference with the .engine file.

Here’s a basic outline of what the code could look like:

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
engine_file_path = 'path_to_your_engine_file.engine'

def load_engine(engine_file_path):
    with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

def main():
    engine = load_engine(engine_file_path)
    context = engine.create_execution_context()

    # Allocate buffers and create a stream.
    inputs, outputs, bindings, stream = [], [], [], cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append({'host': host_mem, 'device': device_mem})
        else:
            outputs.append({'host': host_mem, 'device': device_mem})

    # Assuming input data is in numpy array `input_data`
    np.copyto(inputs[0]['host'], input_data.ravel())
    cuda.memcpy_htod_async(inputs[0]['device'], inputs[0]['host'], stream)

    # Run inference
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(outputs[0]['host'], outputs[0]['device'], stream)
    stream.synchronize()

    # Output data will be in outputs[0]['host']
    print("Inference output:", outputs[0]['host'])

if __name__ == '__main__':
    main()

This script is a simplified example and assumes you have the necessary setup for TensorRT and PyCUDA. You might need to adjust the data handling and buffer management based on your specific model inputs and outputs.

For more comprehensive guidance on exporting and running YOLOv5 models with TensorRT, please refer to our documentation on model export and inference.

github-actions[bot] commented 3 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

ultralytics / yolov5