isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
276 stars 63 forks source link

Using FP16 #26

Closed IsraelLencina closed 3 years ago

IsraelLencina commented 3 years ago

Hi, i have an engine compiled with FP16, if i've not undestood bad, i should alter the pbtxt on the triton server, that's my pbtxt

platform: "tensorrt_plan"
max_batch_size: 1
input {
  name: "data"
  data_type: TYPE_FP16
  dims: 3
  dims: 608
  dims: 608
}
output {
  name: "prob"
  data_type: TYPE_FP16
  dims: 7001
  dims: 1
  dims: 1
}
default_model_filename: "model.plan"

Cause of the input now is FP16 the client tell me that the input don't fit for the model, i've changed in client the mode to FP16, but in preprocess function change the type of the data to FP32, i've tried to change from: image = np.transpose(np.array(image, dtype=**np.float32**, order='C'), (2, 0, 1)) to image = np.transpose(np.array(image, dtype=**np.float16**, order='C'), (2, 0, 1)) but the loss of precision due to strange values in bboxes, making appear inf values... do you know how to make the repository works on FP16? Thanks.

philipp-schmidt commented 3 years ago

You don't have to change anything with the examples to make it work with FP16.

The #define USE_FP16 flag in the code is a simple on/off switch for FP16 optimization. There is no need to change the client code nor the triton startup. Input tensors will still be with 32bit precision, FP16 is only relevant for execution.

You can follow the examples in this repo.step by step and try to uncomment the above flag to compare FP16 speed (default) and FP32 speed (not default and will be slower)

philipp-schmidt commented 3 years ago

Benchmark results in Readme are with FP16 as well

IsraelLencina commented 3 years ago

I have an engine serialized in FP16, even i've i don't provide an pbtxt the server with --strict-model-config=false generate for me the pbtxt with input in FP16...

philipp-schmidt commented 3 years ago

Is it the engine from this repo?

IsraelLencina commented 3 years ago

Using your engine, changing the width and height in yololayer.h to (416,416) and using a custom batch_size make appear inf in 'postprocess' function, in: bboxes = buffer[0, 1 : (num_bboxes * 7 + 1), 0, 0].reshape(-1, 7)

Edit: Actually, the inf appears when the server give me the results, or when those are converted to numpy with result.as_numpy('prob')

philipp-schmidt commented 3 years ago

Hello, yes there might be a problem with custom batch size in the yolo layer plugin. @jkjung-avt fixed this in this commit. You can try and copy his plugin and replace the one in this repo.

philipp-schmidt commented 3 years ago

If you get those errors with batch size > 1 but not with batch size == 1 it is most certainly that error.

IsraelLencina commented 3 years ago

I've been working on it, finally, i've moved to the repo that you have linked (for generate the engine), and use your client adapting it. Thank you!