How to get int8 ReID models

berkay-karlik commented 2 years ago

Search before asking

[X] I have searched the Yolov5_StrongSORT_OSNet issues and discussions and found no similar questions.

Yolov5_StrongSORT_OSNet Component

Other

Bug

Following export works:

python3 reid_export.py --weights ./weights/osnet_x0_25_msmt17.pt --include onnx engine --device 0 --dynamic --batch-size 8

However I want to use half parameter to gain further performance. When I try I get the following error:

orin@ubuntu:~/berkay_monitor/Yolov5_StrongSORT_OSNet$ python3 reid_export.py --weights ./weights/osnet_x0_25_msmt17.pt --include onnx engine --device 0 --half --batch-size 8 
/home/orin/.local/lib/python3.8/site-packages/torchvision-0.13.0-py3.8-linux-aarch64.egg/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 
  warn(f"Failed to load image Python extension: {e}")
/home/orin/berkay_monitor/Yolov5_StrongSORT_OSNet/strong_sort/deep/reid/torchreid/metrics/rank.py:11: UserWarning: Cython evaluation (very fast so highly recommended) is unavailable, now use python evaluation.
  warnings.warn(
YOLOv5 🚀 2022-11-1 Python-3.8.10 torch-1.12.0a0+2c916ef.nv22.3 CUDA:0 (Orin, 30536MiB)

weights/osnet_x0_25_msmt17.pt
Model: osnet_x0_25
- params: 203,568
- flops: 82,316,000
Successfully loaded pretrained weights from "weights/osnet_x0_25_msmt17.pt"
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']

PyTorch: starting from weights/osnet_x0_25_msmt17.pt with output shape (8, 512) (9.3 MB)

starting export with onnx 1.12.0...
export success, saved as weights/osnet_x0_25_msmt17.onnx (0.5 MB)
run --dynamic ONNX model inference with: 'python detect.py --weights weights/osnet_x0_25_msmt17.onnx'

starting export with onnx 1.12.0...
export success, saved as weights/osnet_x0_25_msmt17.onnx (0.5 MB)run --dynamic ONNX model inference with: 'python detect.py --weights weights/osnet_x0_25_msmt17.onnx'

TensorRT: starting export with TensorRT 8.4.1.5...
[11/02/2022-18:50:45] [TRT] [I] [MemUsageChange] Init CUDA: CPU +213, GPU +0, now: CPU 2056, GPU 8458 (MiB)
[11/02/2022-18:50:51] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +351, GPU +411, now: CPU 2426, GPU 8879 (MiB)
reid_export.py:270: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = workspace * 1 << 30
[11/02/2022-18:50:51] [TRT] [I] ----------------------------------------------------------------[11/02/2022-18:50:51] [TRT] [I] Input filename:   weights/osnet_x0_25_msmt17.onnx
[11/02/2022-18:50:51] [TRT] [I] ONNX IR version:  0.0.7
[11/02/2022-18:50:51] [TRT] [I] Opset version:    13
[11/02/2022-18:50:51] [TRT] [I] Producer name:    pytorch[11/02/2022-18:50:51] [TRT] [I] Producer version: 1.12.0
[11/02/2022-18:50:51] [TRT] [I] Domain:           
[11/02/2022-18:50:51] [TRT] [I] Model version:    0
[11/02/2022-18:50:51] [TRT] [I] Doc string:       
[11/02/2022-18:50:51] [TRT] [I] ----------------------------------------------------------------
[11/02/2022-18:50:51] [TRT] [W] onnx2trt_utils.cpp:367: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
TensorRT: Network Description:
TensorRT:       input "images" with shape (8, 3, 256, 128) and dtype DataType.HALF
TensorRT:       output "output" with shape (8, 512) and dtype DataType.HALF
TensorRT: building FP16 engine in weights/osnet_x0_25_msmt17.engine
reid_export.py:298: DeprecationWarning: Use build_serialized_network instead.
  with builder.build_engine(network, config) as engine, open(f, 'wb') as t:
[11/02/2022-18:50:51] [TRT] [E] 4: [network.cpp::operator()::3018] Error Code 4: Internal Error (images: kMIN dimensions in profile 0 are [1,3,256,128] but input has static dimensions [8,3,256,128].)

TensorRT: export failure: __enter__

If I use dynamic it complains that cpu does not support half, however I am trying to generate the engine for the GPU of the device and it has nothing to do with the CPU.

Environment

YOLOv5 🚀 2022-11-1 Python-3.8.10 torch-1.12.0a0+2c916ef.nv22.3 CUDA:0 (Orin, 30536MiB) osnet_x0_25_msmt17.pt OS: Linux ubuntu 5.10.104-tegra Python 3.8.10