Closed saim212 closed 1 year ago
@saim212 to clarify this, could you share:
Hi, I was working to optimize the yolov8s-pose model using NNCF post-training quantization API to speed up the inference. I successfully converted the model.
here is the inference code:
image = np.array(Image.open("datasets/version_2/images/val/frame_8.jpg").resize((1280,1280)))
preprocessed_image = preprocess_image(image)
input_tensor = image_to_tensor(preprocessed_image)
result = det_compiled_model(input_tensor)
it was the output shape after inference result[0].shape == (1, 17, 33600)
I didn't have the idea how to get my keypoints but this colab post-process function resolved the issue.
boxes = result[det_compiled_model.output(0)]
input_hw = input_tensor.shape[2:]
detections = postprocess(pred_boxes=boxes, input_hw=input_hw, orig_img=image, pred_masks=None)
np_array = detections[0]['det'].detach().cpu().numpy()
print(np_array)
here is the output:
[561 581 681 861 0.93645(bbox_conf) 0 (class) 653.99 612.02 0.99978(p1_conf) 613.75 726.99 0.99996(p2_conf) 562.01 847.71 0.99979(p3_conf) 573.51 841.96 0.99996(p4_conf)]
generally issue resolved.
On 12th Gen Intel® Core™ i9-12900K × 24 it gave 27 FPS. I want more than 100FPS but current openvino version not supporting NCS2 and the previous versions have some issues with yolov8s-pose model. Any suggestion?
If your main objective is to improve performance (get a better fps), there are a number of things that you could try:
These are things that could be done within development. There are some ways that you could do to improve performance within deployment too. You may refer here for a detailed explanation.
@saim212 not sure a single NCS2 will give you the 100 FPS that you are aiming to get. Although this model (yolov8s-pose) is not officially supported by Myriad plugin it seems to run on the NCS2 stick, with some tweaks in exporting to ONNX (using onnx==1.8.1) and tweaks in the Model Optimizer command.
I tested this on 2022.1 version with _benchmarkapp but should apply to 2022.3.1 LTS release as well (since these releases support MYRIAD/HDDL devices).
$ pip install onnx==1.8.1 ultralytics
$ yolo export model=yolov8s-pose.pt format=onnx
$ mo --input_model yolov8s-pose.onnx --static_shape --input_shape [1,3,640,640] --data_type FP16
1x NCS2 with MYRIAD plugin:
$ benchmark_app -m yolov8s-pose.xml --time 10 -d MYRIAD -hint throughput
Throughput: 6.13 FPS
2x NCS2 with MULTI device plugin:
$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:MYRIAD.3.1-ma2480,MYRIAD.3.2-ma2480 -hint throughput
Throughput: 12.22 FPS
Note quantized models are not supported by the Myriad plugin, so you won't be able to benefit from this optimization while running on the NCS2.
In addition to get more FPS, you can try to use the MULTI device plugin and run the model across CPU, GPU and MYRIAD devices if it fits your needs. You can also try your already quantized model with MULTI device on CPU + GPU and see if you get better FPS. For example on a i7-9850H with UHD Graphics 630 and a single NCS2 stick the following FPS can be observed. Hope this helps.
$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:CPU,GPU,MYRIAD -hint throughput
Throughput: 36.19 FPS
Thanks @avitial it's helpful.
@saim212 not sure a single NCS2 will give you the 100 FPS that you are aiming to get. Although this model (yolov8s-pose) is not officially supported by Myriad plugin it seems to run on the NCS2 stick, with some tweaks in exporting to ONNX (using onnx==1.8.1) and tweaks in the Model Optimizer command.
I tested this on 2022.1 version with _benchmarkapp but should apply to 2022.3.1 LTS release as well (since these releases support MYRIAD/HDDL devices).
$ pip install onnx==1.8.1 ultralytics $ yolo export model=yolov8s-pose.pt format=onnx $ mo --input_model yolov8s-pose.onnx --static_shape --input_shape [1,3,640,640] --data_type FP16
1x NCS2 with MYRIAD plugin:
$ benchmark_app -m yolov8s-pose.xml --time 10 -d MYRIAD -hint throughput Throughput: 6.13 FPS
2x NCS2 with MULTI device plugin:
$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:MYRIAD.3.1-ma2480,MYRIAD.3.2-ma2480 -hint throughput Throughput: 12.22 FPS
Note quantized models are not supported by the Myriad plugin, so you won't be able to benefit from this optimization while running on the NCS2.
In addition to get more FPS, you can try to use the MULTI device plugin and run the model across CPU, GPU and MYRIAD devices if it fits your needs. You can also try your already quantized model with MULTI device on CPU + GPU and see if you get better FPS. For example on a i7-9850H with UHD Graphics 630 and a single NCS2 stick the following FPS can be observed. Hope this helps.
$ benchmark_app -m yolov8s-pose.xml --time 10 -d MULTI:CPU,GPU,MYRIAD -hint throughput Throughput: 36.19 FPS
Closing issue, feel free to re-open or start a new issue if additional assistance is needed.
I converted the yolov8-pose model with 4 keypoints into openvino int8 format using this. The model give me the output (1, 17, 33600). I cant understand this output. How i can get my 4 detected keypoints with bbox. Model Input: 1280x1280x3