ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
48.64k stars 15.91k forks source link

Can not export to edgetpu model #8842

Closed walterwangimagr closed 1 year ago

walterwangimagr commented 1 year ago

Search before asking

YOLOv5 Component

Export

Bug

I was following the tutorial to train a model and export to edgetpu model. When I use the coco128.yaml as dataset, it was fine I can train and export But when I use a custom dataset with only one class, it fail on the step tflite -> edgetpu I also try with the similar dataset GlobalWheat2020.yaml, same issue. Is there an extra step I need to do or it is a bug? I use the docker image provided and install tensorflow 2.9.1 for export Training script python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --name coco128 Export script python export.py --img 640 --data data/coco128.yaml --weights runs/train/coco128/weights/best.pt --include edgetpu And this work fine

Training script python train.py --img 640 --batch 16 --epochs 3 --data GlobalWheat2020.yaml --weights yolov5s.pt Export script python export.py --img 640 --data data/GlobalWheat2020.yaml --weights runs/train/exp19/weights/best.pt --include edgetpu Run into error image

Environment

-Docker images provided -install tensorflow 2.9.1

Minimal Reproducible Example

# start docker 
sudo docker run -it --gpus '"device=1,2,3"' -v `pwd`:/workspace --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ultralytics/yolov5 /bin/bash

# install tensorflow 
pip install tensorflow 

# train 
python train.py --img 640 --batch 16 --epochs 3 --data data/GlobalWheat2020.yaml --weights yolov5s.pt

# export 
python export.py --img 640 --data data/GlobalWheat2020.yaml --weights runs/train/exp17/weights/best.pt --include edgetpu

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 1 year ago

👋 Hello @walterwangimagr, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 1 year ago

@walterwangimagr thanks for the bug report! I'll try to reproduce using your commands.

glenn-jocher commented 1 year ago

@walterwangimagr yes I'm able to reproduce. Very strange, not sure what the problem might be.

!python train.py --img 640 --batch 16 --epochs 3 --data GlobalWheat2020.yaml --weights yolov5s.pt
!python export.py --img 640 --weights runs/train/exp2/weights/best.pt --include edgetpu

VOC (20cls) also fails. Edge TPU seems to be failing for all non-80 class datasets. Perhaps 80 is hardcoded somewhere in the conversion process. I'll investigate tomorrow.

!python train.py --img 640 --batch 16 --epochs 3 --data VOC.yaml --weights yolov5s.pt
!python export.py --img 640 --data VOC.yaml --weights runs/train/exp3/weights/best.pt --include edgetpu
glenn-jocher commented 1 year ago

@zldrobit I might need some advice here. Edge TPU export is failing for non-COCO models. I'm not sure what the cause is. Export works for coco128 but fails for VOC and GlobalWheat2020 trained models (see above).

TFLite int-8 export works correctly for both prior to Edge TPU failure. What do you think?

Screen Shot 2022-08-04 at 2 32 37 AM
zldrobit commented 1 year ago

@glenn-jocher I could confirm that using the default setting (edgetpu_compiler -s -o) cannot export an EdgeTPU model for VOC (20 classes) or globalwheat (1 class). I searched through the export architecture with edgetpu_compiler -s -d (as suggested in https://github.com/google-coral/edgetpu/issues/450#issuecomment-905480561), and the best model I could get has almost 130 ops running on CPU: image You could reproduce this result immediately with edgetpu_compiler -s -i "model/tf_detect/Reshape_1,model/tf_detect/Reshape_3,model/tf_conv_33/mul_1,model/tf_conv_51/mul_1" globalwheat-int8.tflite or search the export architecture from scratch with edgetpu_compiler -s -d globalwheat-int8.tflite. The VOC model could be exported to EdgeTPU format in the same way.

walterwangimagr commented 1 year ago

I had look at some posts on https://github.com/google-coral/edgetpu/issues/449 and https://github.com/google-coral/edgetpu/issues/405 looks like it could because the resolution limitation. But the weird thing is coco128 is able to use img 640 with 80 classes and any other num of classes will fail. I will try to train some model with lower img size to confirm

walterwangimagr commented 1 year ago

I had experience another weird behaviour, If I change the class name in the coco128.yaml image After training and export to edgetpu.tflite, when I use the edgetpu model to run detect.py, it will still predict as 'person' image

glenn-jocher commented 1 year ago

@walterwangimagr class names are embedded as model attributes after training finishes, i.e. model.names.

@zldrobit ok thanks for the results! Do you think we should update Edge TPU export to edgetpu_compiler -s -i or edgetpu_compiler -s -d? Let me know what the best option is and I will create a PR to resolve this.

glenn-jocher commented 1 year ago

@zldrobit I don't think we can use the -i argument as we don't know the output layer names ahead of time for the different sized models. I tested edgetpu_compiler -s -d with COCO128 and VOC. The COCO128 results are the same (in the same time), the VOC results work while taking a lot longer, but the important thing is they work, so I'll create a PR to add the -d argument to all Edge TPU exports.

glenn-jocher commented 1 year ago

@walterwangimagr good news 😃! Your original issue may now be fixed ✅ in PR #8902 by adding a --search-delegate argument to Edge TPU model compilation per @zldrobit's solution above.

To receive this update:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

walterwangimagr commented 1 year ago

Thank you very much

glenn-jocher commented 8 months ago

@walterwangimagr you're welcome! If you need further assistance or have other questions, feel free to ask. Happy to help!