TensorFlow Lite type is slower by almost half speed than PyTorch one ! 🤷‍♂️

MohamedAtef321 commented 1 year ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Detection

Bug

I worked on Google Colab and tried to run yolov8n trained on custom data, after exporting model to .tflite type, I tried to run model with types of PyTorch and TensorFlow Lite (float 16 and 32).

I expected tensorflow lite model to be faster than the pytorch one, but the result was a surprise to me. PyTorch model has inference time of 230 ms, but TensorFlow Lite model has almost 410 ms with both (float 16 and 32).

I have detailed it all into a pdf file 📘 in this drive link : https://drive.google.com/file/d/1sGayaf3E5YAR1dZlKXp2T_eHPmb840M4/view?usp=sharing

Could you explain how this happens? and is there any way to enhance the performance or FPS of yolov8n model using TensorFlow Lite (or any other software method)?

Any advice will be appreciated. 🙏

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

glenn-jocher commented 1 year ago

@MohamedAtef321 there's no bug here, just a mistaken assumption that TFLite models should be faster than PyTorch models on CPU. If you want speed ideas I'd suggest you just take a look at the daily export benchmarks. https://github.com/ultralytics/ultralytics/actions/runs/4338716714/jobs/7575625589

MohamedAtef321 commented 1 year ago

@MohamedAtef321 there's no bug here, just a mistaken assumption that TFLite models should be faster than PyTorch models on CPU. If you want speed ideas I'd suggest you just take a look at the daily export benchmarks. https://github.com/ultralytics/ultralytics/actions/runs/4338716714/jobs/7575625589

But in the benchmark here the TensorFlow Lite inference time is less than PyTorch one!

Any reasons for that!? 🤷‍♂️

zhangsamson commented 1 year ago

Hello @glenn-jocher,

I observe the same issue as @MohamedAtef321 except the gap is even greater. The tflite version is 8 times slower than the original pytorch model on several of my computers with different OS (Mac, Ubuntu). I tested on the yolov8n-seg model with the default imgsz=640

I checked the latest benchmark : https://github.com/ultralytics/ultralytics/actions/runs/5249812787/jobs/9488947907 It suggests that tflite can be faster than the pytorch version on CPU (or at least not that far away) but I cannot reproduce similar results at all.

Computer specs : Python 3.10.11, Ubuntu 20.04, CPU : Ryzen 9 5900X, ultralytics version : 8.0.117

With yolov8n-seg:

With original pytorch model (CPU): YOLO('yolov8n-seg')('https://ultralytics.com/images/bus.jpg', device="cpu")

Found https://ultralytics.com/images/bus.jpg locally at bus.jpg image 1/1 /home/user/git/ultralytics/bus.jpg: 640x480 4 persons, 1 bus, 1 skateboard, 38.6ms Speed: 1.4ms preprocess, 38.6ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)

With model converted to tensorflow lite (CPU, YOLO('yolov8n-seg').export(format='tflite')): YOLO('yolov8n-seg_saved_model/yolov8n-seg_float32.tflite')('https://ultralytics.com/images/bus.jpg', device="cpu")

/home/user/anaconda3/envs/ultralytics/bin/python /home/user/git/ultralytics/predict.py Loading /home/user/git/ultralytics/yolov8m-seg_saved_model/yolov8m-seg_float32.tflite for TensorFlow Lite inference... INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Found https://ultralytics.com/images/bus.jpg locally at bus.jpg image 1/1 /home/user/git/ultralytics/bus.jpg: 640x640 4 persons, 1 bus, 1 skateboard, 306.0ms Speed: 1.6ms preprocess, 306.0ms inference, 3.8ms postprocess per image at shape (1, 3, 640, 640)

Do you have an idea on why it is happening ? How can it be fixed/improved ?

Thanks.

ultralytics / ultralytics