minhhotboy9x / ultralytics_YOLOv8_custom

2 stars 1 forks source link

How to use yolov8_QT/compare.py? #8

Open StarryAzure opened 5 months ago

StarryAzure commented 5 months ago

Search before asking

Question

i have already saved yolov8l.pth which made by yolov8_QT/ptq.py. i want to use compare.py to compare the different between ptq model and base yolov8l model ,but it goes worng:

ca41d223e3b5c9a7ccd1ebf517ba368

232fd3fbbde8265a60c106ef76470df ](url)

Additional

No response

minhhotboy9x commented 5 months ago

There are some file codes I created just for draft and playing around. This compare.py is not completed. I suggest you use code from Ultralytics for convenience.

StarryAzure commented 5 months ago

There are some file codes I created just for draft and playing around. This compare.py is not completed. I suggest you use code from Ultralytics for convenience.

thanks for your answer.I also have some question about ptq.py it gose worng when the code run to metrics=model.val(data=args.data, bath=args.batch, device='cuda:0'.split=task) it did not have any bug ,but code stop at there and show that [exit with code -1073741819 (0xC0000005)], i have search this question on git, but it doesn't have any useful answer.

minhhotboy9x commented 5 months ago

I saw in that file it is metrics = model.val(data=args.data, batch=args.batch, device='cpu', split='test'). When the model is quantized to INT8 in onnx format, it can not run on GPU. When I test the inference time on CPU, somehow the quantized onnx model is slower than the original model. I suggest you try to convert model to TensorRT format in the latest Ultralytics, it supports running quantized model on GPU so it is much faster.

StarryAzure commented 5 months ago

I saw in that file it is metrics = model.val(data=args.data, batch=args.batch, device='cpu', split='test'). When the model is quantized to INT8 in onnx format, it can not run on GPU. When I test the inference time on CPU, somehow the quantized onnx model is slower than the original model. I suggest you try to convert model to TensorRT format in the latest Ultralytics, it supports running quantized model on GPU so it is much faster.

right, it can run on cpu, but can't on gpu. Do you have any thoughts on this matter?it might cause by the way of int8 quant?

minhhotboy9x commented 5 months ago

@StarryAzure. The major purpose of quantization is to run model on edge devices, which are mostly integrated CPU only. So basically, Pytorch quantization I used just support CPU only, you can see at https://pytorch.org/docs/stable/quantization.html#backend-hardware-support. Regarding how to quantize int8, I think it may vary between different frameworks (The way they calibrate, symmetric/asymmetric quant, fuse layers, ...). If you want to use Pytorch to quantize model to run on GPU, you can try Torchao https://github.com/pytorch/ao (I haven't tried this). However, as I said above, you can use TensorRT to quantize your model to int8 and run it on GPU. I think this is the most optimized for your model.