isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
276 stars 63 forks source link

Add shared memory support to python client #57

Closed t-wata closed 2 years ago

t-wata commented 2 years ago

Thanks to this repo, I was able to deploy the YOLOv4 on Triton Inference Server very easily.

I've implemented shared memory support for the python client, please merge it if you want.

Of course, you can only use it if the client runs on the same host as the Triton Inference Server. The --ipc="host" option is also required when starting the Triton Inference Server with docker run, but it is already included in the command listed in README.md.

Although there are such limitations, a cursory measurement in my environment showed that the time required for inference improved by about 20ms.

### For debug ###

$ git diff
diff --git a/clients/python/client.py b/clients/python/client.py
index 3c6cced..8b9714c 100644
--- a/clients/python/client.py
+++ b/clients/python/client.py
@@ -235,6 +235,8 @@ if __name__ == '__main__':
             print("FAILED: no input image")
             sys.exit(1)

+        debug_start_time = time.perf_counter_ns()
+
         inputs = []
         outputs = []
         inputs.append(grpcclient.InferInput('input', [1, 3, FLAGS.width, FLAGS.height], "FP32"))
@@ -291,6 +293,10 @@ if __name__ == '__main__':
         detected_objects = postprocess(result, input_image.shape[1], input_image.shape[0], [FLAGS.width, FLAGS.height], FLAGS.confidence, FLAGS.nms)
         print(f"Detected objects: {len(detected_objects)}")

+        # Display Inference processing time
+        debug_infer_time = time.perf_counter_ns() - debug_start_time
+        print('Inference processing time: {} ms'.format(debug_infer_time / 1000000))
+
         for box in detected_objects:
             print(f"{COCOLabels(box.classID).name}: {box.confidence}")
             input_image = render_box(input_image, box.box(), color=tuple(RAND_COLORS[box.classID % 64].tolist()))

### Test without shared memory ###

$ python client.py -o data/dog_inferred.jpg image data/dog.jpg
Running in 'image' mode
Creating buffer from image file...
Invoking inference...
Done
Received result buffer of size (1, 159201, 1, 1)
Naive buffer sum: 565762.875
Detected objects: 3
Inference processing time: 53.121086 ms  # <- BEFORE
DOG: 0.9786704182624817
BICYCLE: 0.9221425652503967
TRUCK: 0.9161325097084045
Saved result to data/dog_inferred.jpg

### Test with shared memory ###

$ python client.py --shm -o data/dog_inferred.jpg image data/dog.jpg
Running in 'image' mode
shared_memory_status: regions {
  key: "input_data"
  value {
    name: "input_data"
    key: "/input_simple"
    byte_size: 4435968
  }
}
regions {
  key: "output_data"
  value {
    name: "output_data"
    key: "/output_simple"
    byte_size: 4435968
  }
}

Creating buffer from image file...
Invoking inference...
Done
Detected objects: 3
Inference processing time: 33.28672 ms   # <- AFTER
DOG: 0.9786704182624817
BICYCLE: 0.9221425652503967
TRUCK: 0.9161325097084045
Saved result to data/dog_inferred.jpg
Cleanup shared memory...
shared_memory_status:

There is a related issue https://github.com/isarsoft/yolov4-triton-tensorrt/issues/31, but it was closed without implementation.

philipp-schmidt commented 2 years ago

Looks good, thanks for the contribution!

philipp-schmidt commented 2 years ago

Now in the list of contributions