Open StarryAzure opened 5 months ago
There are some file codes I created just for draft and playing around. This compare.py
is not completed. I suggest you use code from Ultralytics for convenience.
There are some file codes I created just for draft and playing around. This
compare.py
is not completed. I suggest you use code from Ultralytics for convenience.
thanks for your answer.I also have some question about ptq.py it gose worng when the code run to metrics=model.val(data=args.data, bath=args.batch, device='cuda:0'.split=task) it did not have any bug ,but code stop at there and show that [exit with code -1073741819 (0xC0000005)], i have search this question on git, but it doesn't have any useful answer.
I saw in that file it is metrics = model.val(data=args.data, batch=args.batch, device='cpu', split='test')
. When the model is quantized to INT8 in onnx
format, it can not run on GPU. When I test the inference time on CPU, somehow the quantized onnx model is slower than the original model. I suggest you try to convert model to TensorRT format in the latest Ultralytics, it supports running quantized model on GPU so it is much faster.
I saw in that file it is
metrics = model.val(data=args.data, batch=args.batch, device='cpu', split='test')
. When the model is quantized to INT8 inonnx
format, it can not run on GPU. When I test the inference time on CPU, somehow the quantized onnx model is slower than the original model. I suggest you try to convert model to TensorRT format in the latest Ultralytics, it supports running quantized model on GPU so it is much faster.
right, it can run on cpu, but can't on gpu. Do you have any thoughts on this matter?it might cause by the way of int8 quant?
@StarryAzure. The major purpose of quantization is to run model on edge devices, which are mostly integrated CPU only. So basically, Pytorch quantization I used just support CPU only, you can see at https://pytorch.org/docs/stable/quantization.html#backend-hardware-support
. Regarding how to quantize int8, I think it may vary between different frameworks (The way they calibrate, symmetric/asymmetric quant, fuse layers, ...).
If you want to use Pytorch to quantize model to run on GPU, you can try Torchao https://github.com/pytorch/ao
(I haven't tried this).
However, as I said above, you can use TensorRT to quantize your model to int8 and run it on GPU. I think this is the most optimized for your model.
Search before asking
Question
i have already saved yolov8l.pth which made by yolov8_QT/ptq.py. i want to use compare.py to compare the different between ptq model and base yolov8l model ,but it goes worng:
](url)
Additional
No response