openvinotoolkit / openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™
Apache License 2.0
2.35k stars 798 forks source link

yolov8-pose quantized openvino model #1141

Closed saim212 closed 1 year ago

saim212 commented 1 year ago

I converted the yolov8-pose model with 4 keypoints into openvino int8 format using this. The model give me the output (1, 17, 33600). I cant understand this output. How i can get my 4 detected keypoints with bbox. Model Input: 1280x1280x3

Iffa-Intel commented 1 year ago

@saim212 to clarify this, could you share:

  1. Your original model (files or source link), did you use this model?
  2. Your IR files
  3. Commands that you use till reaching the issue mentioned
  4. Details on your use case (what you are trying to achieve)
saim212 commented 1 year ago

Hi, I was working to optimize the yolov8s-pose model using NNCF post-training quantization API to speed up the inference. I successfully converted the model.

here is the inference code:

image = np.array(Image.open("datasets/version_2/images/val/frame_8.jpg").resize((1280,1280)))
preprocessed_image = preprocess_image(image)
input_tensor = image_to_tensor(preprocessed_image)
result = det_compiled_model(input_tensor)

it was the output shape after inference result[0].shape == (1, 17, 33600)

I didn't have the idea how to get my keypoints but this colab post-process function resolved the issue.

boxes = result[det_compiled_model.output(0)]
input_hw = input_tensor.shape[2:]
detections = postprocess(pred_boxes=boxes, input_hw=input_hw, orig_img=image, pred_masks=None)
np_array = detections[0]['det'].detach().cpu().numpy()
print(np_array)

here is the output:

[561 581 681 861 0.93645(bbox_conf) 0 (class) 653.99 612.02 0.99978(p1_conf) 613.75 726.99 0.99996(p2_conf) 562.01 847.71 0.99979(p3_conf) 573.51 841.96 0.99996(p4_conf)] generally issue resolved.

On 12th Gen Intel® Core™ i9-12900K × 24 it gave 27 FPS. I want more than 100FPS but current openvino version not supporting NCS2 and the previous versions have some issues with yolov8s-pose model. Any suggestion?

Iffa-Intel commented 1 year ago

If your main objective is to improve performance (get a better fps), there are a number of things that you could try:

  1. Post-training Optimization tool (POT)
  2. Neural Network Compression Framework (NNCF) (which you mentioned that you are doing it now)
  3. Model Optimizer

These are things that could be done within development. There are some ways that you could do to improve performance within deployment too. You may refer here for a detailed explanation.

avitial commented 1 year ago

@saim212 not sure a single NCS2 will give you the 100 FPS that you are aiming to get. Although this model (yolov8s-pose) is not officially supported by Myriad plugin it seems to run on the NCS2 stick, with some tweaks in exporting to ONNX (using onnx==1.8.1) and tweaks in the Model Optimizer command.

I tested this on 2022.1 version with _benchmarkapp but should apply to 2022.3.1 LTS release as well (since these releases support MYRIAD/HDDL devices).

$ pip install onnx==1.8.1 ultralytics
$ yolo export model=yolov8s-pose.pt format=onnx
$ mo --input_model yolov8s-pose.onnx --static_shape --input_shape [1,3,640,640] --data_type FP16

1x NCS2 with MYRIAD plugin:

$ benchmark_app -m yolov8s-pose.xml --time 10 -d MYRIAD -hint throughput
Throughput: 6.13 FPS

2x NCS2 with MULTI device plugin:

$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:MYRIAD.3.1-ma2480,MYRIAD.3.2-ma2480 -hint throughput
Throughput: 12.22 FPS

Note quantized models are not supported by the Myriad plugin, so you won't be able to benefit from this optimization while running on the NCS2.

In addition to get more FPS, you can try to use the MULTI device plugin and run the model across CPU, GPU and MYRIAD devices if it fits your needs. You can also try your already quantized model with MULTI device on CPU + GPU and see if you get better FPS. For example on a i7-9850H with UHD Graphics 630 and a single NCS2 stick the following FPS can be observed. Hope this helps.

$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:CPU,GPU,MYRIAD -hint throughput
Throughput: 36.19 FPS
saim212 commented 1 year ago

Thanks @avitial it's helpful.

@saim212 not sure a single NCS2 will give you the 100 FPS that you are aiming to get. Although this model (yolov8s-pose) is not officially supported by Myriad plugin it seems to run on the NCS2 stick, with some tweaks in exporting to ONNX (using onnx==1.8.1) and tweaks in the Model Optimizer command.

I tested this on 2022.1 version with _benchmarkapp but should apply to 2022.3.1 LTS release as well (since these releases support MYRIAD/HDDL devices).

$ pip install onnx==1.8.1 ultralytics
$ yolo export model=yolov8s-pose.pt format=onnx
$ mo --input_model yolov8s-pose.onnx --static_shape --input_shape [1,3,640,640] --data_type FP16

1x NCS2 with MYRIAD plugin:

$ benchmark_app -m yolov8s-pose.xml --time 10 -d MYRIAD -hint throughput
Throughput: 6.13 FPS

2x NCS2 with MULTI device plugin:

$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:MYRIAD.3.1-ma2480,MYRIAD.3.2-ma2480 -hint throughput
Throughput: 12.22 FPS

Note quantized models are not supported by the Myriad plugin, so you won't be able to benefit from this optimization while running on the NCS2.

In addition to get more FPS, you can try to use the MULTI device plugin and run the model across CPU, GPU and MYRIAD devices if it fits your needs. You can also try your already quantized model with MULTI device on CPU + GPU and see if you get better FPS. For example on a i7-9850H with UHD Graphics 630 and a single NCS2 stick the following FPS can be observed. Hope this helps.

$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:CPU,GPU,MYRIAD -hint throughput
Throughput: 36.19 FPS
Iffa-Intel commented 1 year ago

Closing issue, feel free to re-open or start a new issue if additional assistance is needed.