ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.45k stars 16.28k forks source link

CoreML Export Error: export failure: 'torch._C.Node' object has no attribute 'ival' #2961

Closed samygarg closed 3 years ago

samygarg commented 3 years ago

🐛 Bug

I am trying to export the default trained YOLOv5 Model as given here to CoreML but getting an error on both Colab as well as my laptop:

CoreML: export failure: 'torch._C.Node' object has no attribute 'ival'

Screenshot 2021-04-28 at 13 55 39

To Reproduce (REQUIRED)

Follow the steps mentioned here.

Expected behavior

Export the CoreML model successfully.

Environment

Colab and Macbook Pro 13 inch 2019.

github-actions[bot] commented 3 years ago

👋 Hello @samygarg, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

meng1994412 commented 3 years ago

I have the same issue.

The environment I have is: OS: Ubuntu 18.04 Packages (by using pip install -r requirement.txt): torch==1.8.1 torchvision==0.9.1 coremltools==4.1 onnx==1.9.0 scikit-learn==0.19.2

haynec commented 3 years ago

I also get the same issue.

The environment I have is: OS: Ubuntu 20.04 Packages (by using pip install -r requirement.txt): torch==1.8.1 torchvision==0.9.1 coremltools==4.1 onnx==1.9.0 scikit-learn==0.19.2

JorgeCeja commented 3 years ago

Hey everyone, it appears that optimize_for_mobile from torch is what causes the incompatibility issue with coremltools.

The solution is to comment the line before export. Optimally it should be an arg option, pull request anyone? https://github.com/ultralytics/yolov5/blob/33712d6dd0cc54e28b97d56cb999aa050a1c94ef/models/export.py#L72

haynec commented 3 years ago

Thanks @JorgeCeja, that solution worked for me.

samygarg commented 3 years ago

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro.

Here's what I am getting:


CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:
meng1994412 commented 3 years ago

Thanks @JorgeCeja, the solution you provided worked for me.

glemarivero commented 3 years ago

Hey everyone, it appears that optimize_for_mobile from torch is what causes the incompatibility issue with coremltools.

The solution is to comment the line before export. Optimally it should be an arg option, pull request anyone? https://github.com/ultralytics/yolov5/blob/33712d6dd0cc54e28b97d56cb999aa050a1c94ef/models/export.py#L72

@JorgeCeja Can you share your environment? This change fixed the original issue, but I still face the same issue as @samygarg

pocketpixels commented 3 years ago

It seems CoreML export is broken in multiple ways currently. The export did work with the above change until very recently. However only when not specifying --grid, which meant that the Detect module did not get exported. When trying to export with --grid you would get the same export failure at op 730. Commit b292837 from issue #2982 (from May 3rd) changed the export implementation to export the Detect module by default.

pocketpixels commented 3 years ago

After trying many different previous commits (and different Pytorch versions) today my impression is that exporting the whole network including the Detect module to CoreML probably never worked? If anyone knows of a version/commit (and environment) where it did work I would love to know.

zhedahe commented 3 years ago

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro.

Here's what I am getting:

CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:

I meet the same question, my enviroment: OS: Ubuntu 16.04 Packages (by using pip install -r requirement.txt): torch==1.8.1 torchvision==0.9.1 coremltools==4.1 onnx==1.9.0 scikit-learn==0.19.2

error message: …… Converting op 725 : constant Adding op '725' of type const Converting op 726 : mul Adding op '726' of type mul Converting op 727 : constant Adding op '727' of type const Converting op 728 : sub Adding op '728' of type sub Converting op 729 : add Adding op '729' of type add Converting op 730 : select Converting Frontend ==> MIL Ops: 87%|███▍| 604/695 [00:00<00:00, 1149.09 ops/s] CoreML: export failure:

hope somebody gave advice, thanks!

glenn-jocher commented 3 years ago

@meng1994412 @haynec @JorgeCeja @samygarg good news 😃! Your original issue may now been fixed ✅ in PR #3055. Note that this does not solve CoreML export completely, but it should resolve the original error message in this issue.

To receive this update you can:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

jedikim commented 3 years ago

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro. Here's what I am getting:

CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:

I meet the same question, my enviroment: OS: Ubuntu 16.04 Packages (by using pip install -r requirement.txt): torch==1.8.1 torchvision==0.9.1 coremltools==4.1 onnx==1.9.0 scikit-learn==0.19.2

error message: …… Converting op 725 : constant Adding op '725' of type const Converting op 726 : mul Adding op '726' of type mul Converting op 727 : constant Adding op '727' of type const Converting op 728 : sub Adding op '728' of type sub Converting op 729 : add Adding op '729' of type add Converting op 730 : select Converting Frontend ==> MIL Ops: 87%|███▍| 604/695 [00:00<00:00, 1149.09 ops/s] CoreML: export failure:

hope somebody gave advice, thanks!

@glenn-jocher using current version(hotfixed), Still this issue is happen with me.

OS: Ubuntu 20.04 Packages (by using pip install -r requirement.txt): torch==1.8.1 torchvision==0.9.1 coremltools==4.1 onnx==1.9.0 scikit-learn==0.19.2

@meng1994412 @haynec @JorgeCeja @samygarg good news 😃! Your original issue may now been fixed ✅ in PR #3055. Note that this does not solve CoreML export completely, but it should resolve the original error message in this issue.

To receive this update you can:

  • git pull from within your yolov5/ directory
  • git clone https://github.com/ultralytics/yolov5 again
  • Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • View our updated notebooks: Open In Colab Open In Kaggle

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@glenn-jocher using current version(hotfixed), Still this issue is happen with me.

OS: Ubuntu 20.04 Packages (by using pip install -r requirement.txt): torch==1.8.1 torchvision==0.9.1 coremltools==4.1 onnx==1.9.0 scikit-learn==0.19.2

Adding op '724' of type slice_by_index Adding op '724_begin_0' of type const Adding op '724_end_0' of type const Adding op '724_end_mask_0' of type const Converting op 725 : constant Adding op '725' of type const Converting op 726 : mul Adding op '726' of type mul Converting op 727 : constant Adding op '727' of type const Converting op 728 : sub Adding op '728' of type sub Converting op 729 : add Adding op '729' of type add Converting op 730 : select Converting Frontend ==> MIL Ops: 87%|████▎| 604/695 [00:00<00:00, 970.22 ops/s] CoreML: export failure:

pocketpixels commented 3 years ago

Still this issue is happen with me.

@jedikim He mentioned that this does not (yet?) fix CoreML export, it only fixes the particular issue reported in this bug report (the first post at the top).

zhedahe commented 3 years ago

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro. Here's what I am getting:

CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:

I meet the same question, my enviroment: OS: Ubuntu 16.04 Packages (by using pip install -r requirement.txt): torch==1.8.1 torchvision==0.9.1 coremltools==4.1 onnx==1.9.0 scikit-learn==0.19.2

error message: …… Converting op 725 : constant Adding op '725' of type const Converting op 726 : mul Adding op '726' of type mul Converting op 727 : constant Adding op '727' of type const Converting op 728 : sub Adding op '728' of type sub Converting op 729 : add Adding op '729' of type add Converting op 730 : select Converting Frontend ==> MIL Ops: 87%|███▍| 604/695 [00:00<00:00, 1149.09 ops/s] CoreML: export failure:

hope somebody gave advice, thanks!

I try it again after update, but it output the same err result with yesterday, so as glenn-jocher mentioned above: this does not solve CoreML export completely

glemarivero commented 3 years ago

Here are my two cents on this: You can checkout a previous commit such as 33712d6dd0cc54e28b97d56cb999aa050a1c94ef and comment line https://github.com/ultralytics/yolov5/blob/33712d6dd0cc54e28b97d56cb999aa050a1c94ef/models/export.py#L72 as they said above. However, as @pocketpixels said, this will not export the complete model. Instead the outputs will be the nl outputs given by: https://github.com/ultralytics/yolov5/blob/33712d6dd0cc54e28b97d56cb999aa050a1c94ef/models/yolo.py#L48 Which means you have to do the grid scaling operations in the CoreML side, and concatenate the nl results to obtain a [n_achors x (nc+5)] matrix. Then you will need to adapt this to the input format of the Non Maxima Suppression layer:

I'm pretty new to using CoreM builder. So far I'm using this as my guideline: https://github.com/hollance/coreml-survival-guide/blob/master/MobileNetV2%2BSSDLite/ssdlite.py If anyone knows how to do it and could post the complete solution it would be great. Otherwise, I'll be working on that, and once I finish (if I do) I'll post it here.

glenn-jocher commented 3 years ago

@meng1994412 @haynec @JorgeCeja @samygarg @glemarivero good news 😃! Outstanding CoreML export issues may now been fixed ✅ in a second PR #3066. This adds a --train option suitable for CoreML model export which exports the model in .train() mode rather than .eval() mode, avoiding the grid construction code that causes CoreML export to fail:

python models/export.py --train

All batchnorm fusion ops have already occured at the new model.train() point, so the only difference should be in the Detect layer.

To receive this update you can:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

zhedahe commented 3 years ago

@meng1994412 @haynec @JorgeCeja @samygarg @glemarivero good news 😃! Outstanding CoreML export issues may now been fixed ✅ in a second PR #3066. This adds a --train option suitable for CoreML model export which exports the model in .train() mode rather than .eval() mode, avoiding the grid construction code that causes CoreML export to fail:

python models/export.py --train

All batchnorm fusion ops have already occured at the new model.train() point, so the only difference should be in the Detect layer.

To receive this update you can:

* `git pull` from within your `yolov5/` directory

* `git clone https://github.com/ultralytics/yolov5` again

* Force-reload [PyTorch Hub](https://pytorch.org/hub/ultralytics_yolov5/): `model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)`

* View our updated notebooks:  [![Open In Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb) [![Open In Kaggle](https://camo.githubusercontent.com/a08ca511178e691ace596a95d334f73cf4ce06e83a5c4a5169b8bb68cac27bef/68747470733a2f2f6b6167676c652e636f6d2f7374617469632f696d616765732f6f70656e2d696e2d6b6167676c652e737667)](https://www.kaggle.com/models/ultralytics/yolov5)

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

yeah!!! I input a cmd: python models/export.py --train --weights yolov5s.pt --img 640 --batch 1, and it works ok with no error!!! thanks a lot, 3x!

glemarivero commented 3 years ago

Thanks for adding the --train option. But we still can't use the CoreML model for inference, right? Or am I missing something?

glenn-jocher commented 3 years ago

@glemarivero yes the exported model can be used for any purpose.

glemarivero commented 3 years ago

I meant that we still need to do what I put earlier. Aren't the outputs of the model still 714, 727 and 740? Thanks

meng1994412 commented 3 years ago

I agree with @pocketpixels and @glemarivero. The CoreML model (the latest update)currently got exported does not contain any detect module. Thus the CoreML model cannot be directly used for inference.

meng1994412 commented 3 years ago

@meng1994412 @haynec @JorgeCeja @samygarg @glemarivero good news ! Outstanding CoreML export issues may now been fixed in a second PR #3066. This adds a --train option suitable for CoreML model export which exports the model in .train() mode rather than .eval() mode, avoiding the grid construction code that causes CoreML export to fail:

python models/export.py --train

All batchnorm fusion ops have already occured at the new model.train() point, so the only difference should be in the Detect layer.

To receive this update you can:

  • git pull from within your yolov5/ directory
  • git clone https://github.com/ultralytics/yolov5 again
  • Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • View our updated notebooks: Open In Colab Open In Kaggle

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 !

Will the grid construction be included in CoreML export in the future update?

pocketpixels commented 3 years ago

It definitely would be desirable to have the detect module included in the CoreML output. And if and when we can get that to work it might also be worthwhile to add a CoreML NMS layer to the generated CoreML model (as discussed by @glemarivero).

@glenn-jocher Do you happen to know which part of the Detect implementation the CoreML converter chokes on? Maybe it could be possible to find a workaround by reformulating one of the Pytorch operations involved?

pocketpixels commented 3 years ago

I looked into what is causing the export failure a bit. What I found so far is that it is related to self.stride and self.anchor_grid in the box calculations here: https://github.com/ultralytics/yolov5/blob/d2a17289c99ad45cb901ea81db5932fa0ca9b711/models/yolo.py#L55-L61

If we comment out or remove those from the calculations then the CoreML conversion runs to completion (accessing and using self.grid in those calculations seems to be fine).

I have not yet figured out though why these are causing problems. With anchor_grid I initially suspected it could be that the tensor rank is higher than CoreML can handle. However stride is just a vector of 3 floats. It gets set outside of the module's init, maybe that could be causing the issue somehow? I'll look into this more later, but thought I'd share what I found so far in case someone else (who is maybe more experienced with Pytorch & CoreML) has ideas and/or wants to investigate further.

glemarivero commented 3 years ago

Hi, I was able to put everything together. Take a look at this notebook: example_yolov5s_to_coreml.ipynb.zip Please let me know if you find any errors. It is only for the yolov5 small version, but it shouldn't be difficult to adapt it to the others. Hope is useful for the rest 🙂

pocketpixels commented 3 years ago

@glemarivero Fantastic work, thank you for sharing!

pocketpixels commented 3 years ago

Continuing my investigation into the cause for the error during the CoreML export of the Detect module: Just focusing on the --inplace branch in the code cited above, so these two lines: https://github.com/ultralytics/yolov5/blob/d2a17289c99ad45cb901ea81db5932fa0ca9b711/models/yolo.py#L56-L57

With these modifications the CoreML conversion completes without errors:

s = self.stride[i].item()
ag = self.anchor_grid[i].numpy()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * s  # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * ag  # wh

That is if we force Pytorch to treat stride and anchor_grid as constants and forget how they were computed (which I believe should be ok, because they are not input dependend?) then the CoreML converter has no issues. (I have not tried running the resulting model on iOS yet).

Clearly the above change is not the solution (as I believe it would impact inference performance), but maybe it is a good hint at what a better solution might be (for someone like @glenn-jocher who understands the code base and PyTorch better than I do)?

Update: While the conversion completes, looking at the resulting graph in Netron I don't think it actually includes the box coordinate computations.

Update 2: Converting without --inplace and making the equivalent changes to that branch of the code does result in a model that seems to include the box coordinate computations.

s = self.stride[i].item()
ag = self.anchor_grid[i].view(1, self.na, 1, 1, 2).numpy()
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * s  # xy
wh = (y[..., 2:4] * 2) ** 2 * ag  # wh
y = torch.cat((xy, wh, y[..., 4:]), -1)

coreml_export

glenn-jocher commented 3 years ago

@meng1994412 @glemarivero @pocketpixels to clarify, all modules including the Detect() layer are exported by export.py, no modules are missing. The --train flag simply places the model in model.train() mode, which allows the Detect() layer to sidestep the grid and concatenation ops. https://github.com/ultralytics/yolov5/blob/251aeafcb16ebc4c9d9a6641b3677aaac2f2d2cb/models/export.py#L57-L58

glemarivero commented 3 years ago

but if you do how do you continue? how do you get the final bounding boxes?

glemarivero commented 3 years ago

In case anyone is interested, I put together a script to output a CoreML .mlmodel that can be opened with XCode (the previous model wasn't), and can be used to preview inference results inside it. Again, I only did it for yolov5s.

export.py.zip

python models/export.py --train

image

pocketpixels commented 3 years ago

Thanks for sharing @glemarivero. I also wrote a similar (but different) CoreML export script that generates a CoreML model that can be previewed in Xcode and can easily be used with Apple's Vision framework and yields a VNRecognizedObjectObservation for each detected object. I modified the code in the Detect module similar to what I discussed above (but there was still a missing step) so that it can be exported by the coremltools convert function. It should work for all the differently sized variants of the Yolo v5 model. To try it I recommend checking out the branch from my forked repo into a separate directory: git clone -b better_coreml_export https://github.com/pocketpixels/yolov5.git yolov5_coreml_export From within that directory use it with python models/coreml_export.py --weights [model weights file]

glemarivero commented 3 years ago

Nice work @pocketpixels! Thanks for sharing 🙂