isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
277 stars 63 forks source link

Are smaller input sizes possible? [Jetson Nano] #11

Closed chull434 closed 3 years ago

chull434 commented 3 years ago

Hi,

Thanks for this repo and the very easy to follow readme, was able to get this working fine on my jetson nano with the jetpack triton packages.

The yolov4 model that gets built is 608/FP32 which is maybe a bit heavy for the jetson nano. How do I reduce the input size to like 320 or 416 and get FP16 and INT8. Do I need a model build for each different combo of input size and mode?

In other non tensorRT yolov4 implementations you can change input size and mode with just variables, maybe I am not understanding what tensorRT is doing under the hood.

Thanks ~whiskers434

philipp-schmidt commented 3 years ago

Hi, you welcome. Yes, changing the input size should be possible by editing the following lines:

https://github.com/isarsoft/yolov4-triton-tensorrt/blob/d7387046effdb757ec8bbd1de05cc991181b995c/layers/yololayer.h#L19-L20

I haven't tried this before, so just check it out and if you have issues write em down here.

Regarding FP16 and INT8: FP16 is the default in this codebase:

https://github.com/isarsoft/yolov4-triton-tensorrt/blob/d7387046effdb757ec8bbd1de05cc991181b995c/networks/yolov4.h#L10

For INT8 TensorRT requires additional code to normalize the weights in the network. I have tried it before and got no time to finish it, but it's marked as TODO in this repo.

philipp-schmidt commented 3 years ago

P.S.: The postprocessing in the client has to be changed as well:

https://github.com/isarsoft/yolov4-triton-tensorrt/blob/d7387046effdb757ec8bbd1de05cc991181b995c/clients/python/processing.py#L6-L7

philipp-schmidt commented 3 years ago

Is there any chance you could provide a few benchmarks on the Jetson Nano, similar to the benchmarking section in this repo?

It would be nice to have results on a very well comparable hardware.

chull434 commented 3 years ago

thanks, I see those width and height variables now and the models build ok with 320 and 416

tho I don't think I understand what's happening with the USE_FP16 define thing, sorry my c++ is poor, my models are all coming out as FP32, even as default with no code changes

what should I be doing to toggle between FP32 and FP16?

yes would be more than happy to provide some benchmarks for the jetson nano

chull434 commented 3 years ago

I tried running those models but it is throwing runtime errors and initial debugging looks like there is at least one infinity number in the results array causing the runtime error in the postprocess maths

philipp-schmidt commented 3 years ago

my models are all coming out as FP32

How do you determine they are FP32? The default is FP16 optimization. To turn FP16 off, comment out #define USE_FP16 like so: //#define USE_FP16. Note that even in FP16 mode, the inputs are still showing up as FP32 if that's how you determine it.

I tried running those models but it is throwing runtime errors and initial debugging looks like there is at least one infinity number in the results array causing the runtime error in the postprocess maths

Can you provide more details?

philipp-schmidt commented 3 years ago

Could you resolve the issue? @chull434

chull434 commented 3 years ago

No sorry haven't got back round to having another look at this yet. The default model has been working fine for the minute.

Been having fun trying to hack the deep sort object tracking thing over to triton inference as well as other bits and pieces of my app code. Once that's all good will swing back round to this and take another look at optimizing the model settings for jetson