请问如何确认TensorRT导出engine某一层的weight dtype是INT8还是HALF？如果模型中存在部分INT8不支持的层，TensorRT是否会自动设置混合精度？

songkq commented 2 years ago

Before Asking

[X] I have read the README carefully. 我已经仔细阅读了README上的操作指引。
[X] I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集，我已经仔细阅读了训练自定义数据的教程，以及按照正确的目录结构存放数据集。（FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。）
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码，重新运行之后，问题仍不能解决。

Search before asking

[X] I have searched the YOLOv6 issues and found no similar questions.

Question

请问如何确认TensorRT导出engine某一层的weight dtype是INT8还是HALF？如果模型中存在部分INT8不支持的层，TensorRT是否会自动设置混合精度？

Additional

No response

lippman1125 commented 2 years ago

@songkq 首先这是2个问题，第一个问题，如果想确认TensorRT engine导出某一层的类型，请使用工具https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer，可以生成engine的graph。第二个问题，会的。TensorRT PTQ 就是根据“速度优先”原则选择使用的算子类型，如果一个算子支持int8，但是fp16更快，则也会使用fp16。所以有时候PTQ生成的模型并非全是量化算子。详细的请参考https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#intro-quantization

songkq commented 2 years ago

@songkq 首先这是2个问题，第一个问题，如果想确认TensorRT engine导出某一层的类型，请使用工具https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer，可以生成engine的graph。第二个问题，会的。TensorRT PTQ 就是根据“速度优先”原则选择使用的算子类型，如果一个算子支持int8，但是fp16更快，则也会使用fp16。所以有时候PTQ生成的模型并非全是量化算子。详细的请参考https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#intro-quantization

@lippman1125 好的，感谢～另外TensorRT是否可以手动指定某一层使用的算子类型为int8/fp16？

lippman1125 commented 2 years ago

@songkq 可以的，请参考https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#layer-level-control

zahidzqj commented 1 year ago

onnx转engine的时候，是推荐使用tensorRT8吗，用tensorRT7.2.1.6一直失败。 [12/12/2022-08:43:04] [E] [TRT] QuantizeLinear_7_quantize_scale_node: shift weights has count 32 but 3 was expected [12/12/2022-08:43:04] [E] [TRT] QuantizeLinear_7_quantize_scale_node: shift weights has count 32 but 3 was expected [12/12/2022-08:43:04] [E] [TRT] QuantizeLinear_7_quantize_scale_node: shift weights has count 32 but 3 was expected While parsing node number 8 [DequantizeLinear]: ERROR: /data/TensorRT/parsers/onnx/builtin_op_importers.cpp:852 In function importDequantizeLinear: [6] Assertion failed: K == scale.count()

meituan / YOLOv6