Very small speed improvement going from float32->float16->int8

lewes6369 / TensorRT-Yolov3

TensorRT for Yolov3

MIT License

489 stars 165 forks source link

Very small speed improvement going from float32->float16->int8 #78

Open ttdd11 opened 4 years ago

ttdd11 commented 4 years ago

Typically when using TensorRT I see significant speed improvements but on my card I haven't really seen any (2080 ti).

The bounding boxes look good in all versions, so I can't really figure out why there wouldn't be any speed improvements.

Do you have any methods to troubleshoot?

lewes6369 commented 4 years ago

Does 2080 ti device support the faster fp16 and faster int8? Try some different size input model to figure out the performance improvement in different precision.

ttdd11 commented 4 years ago

@lewes6369 Thanks for the reply. 2080ti have that capability. Do you think it could be a problem that I'm using trt 6 and cuda 10.1? How would I input different model sizes when I only have the caffee model on this repo?

lewes6369 commented 4 years ago

Do you think it could be a problem that I'm using trt 6 and cuda 10.1?

The version maybe some contrib to the result. If you use trt6, you can check some support such as NVIDIA DLA for faster fp16 and int8. Just need to enable config flags. https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_topic.

How would I input different model sizes when I only have the caffee model on this repo?

You can refine your network structure for other input,but you need to finetuning your model by caffe or darknet;