ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.19k stars 16.2k forks source link

Torch MPS (gpu) acceleration not working M1 Mac. #8102

Closed jerjer1223 closed 2 years ago

jerjer1223 commented 2 years ago

Search before asking

YOLOv5 Component

Detection

Bug

When I change the device to mps with --device mps. It gives me "RuntimeError: don't know how to restore data location of torch.storage._UntypedStorage (tagged with mps)."

Torch 1.13 has GPU acceleration, as stated on their website and this article (https://towardsdatascience.com/gpu-acceleration-comes-to-pytorch-on-m1-macs-195c399efcc1)

Environment

YOLOv5 πŸš€ 2022-6-3 Python-3.9.13 torch-1.13.0.dev20220604 MPS

Minimal Reproducible Example

python detect.py --device mps

Additional

Full log here

detect: weights=yolov5s.pt, source=data/images, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=mps, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False YOLOv5 πŸš€ 2022-6-3 Python-3.9.13 torch-1.13.0.dev20220604 MPS

Traceback (most recent call last): File "/Users/jerry/Documents/yolov5-master/detect.py", line 252, in main(opt) File "/Users/jerry/Documents/yolov5-master/detect.py", line 247, in main run(*vars(opt)) File "/opt/homebrew/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/Users/jerry/Documents/yolov5-master/detect.py", line 92, in run model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half) File "/Users/jerry/Documents/yolov5-master/models/common.py", line 334, in init model = attempt_load(weights if isinstance(weights, list) else w, device=device) File "/Users/jerry/Documents/yolov5-master/models/experimental.py", line 80, in attempt_load ckpt = torch.load(attempt_download(w), map_location=device) File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 712, in load return _load(opened_zipfile, map_location, pickle_module, pickle_load_args) File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 1049, in _load result = unpickler.load() File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 1019, in persistent_load load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 1001, in load_tensor wrap_storage=restore_location(storage, location), File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 973, in restore_location return default_restore_location(storage, str(map_location)) File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 178, in default_restore_location raise RuntimeError("don't know how to restore data location of " RuntimeError: don't know how to restore data location of torch.storage._UntypedStorage (tagged with mps)

Are you willing to submit a PR?

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello @jerjer1223, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@jerjer1223 good news πŸ˜ƒ! Your original issue may now be fixed βœ… in PR #8121. This PR removes MPS from the torch.device() map_location argument which appears to be the original source of the issue.

To receive this update:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 πŸš€!

GerardWalsh commented 2 years ago

@glenn-jocher the above solved the same issue for me "RuntimeError: don't know how to restore data location of torch.storage._UntypedStorage (tagged with mps).", but I am now running into:

MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:782: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 25600 bytes

glenn-jocher commented 2 years ago

@GerardWalsh great, I'm glad we resolved the original issue. The buffer size issue is known to the pytorch team and I believe they are working on solutions for it. See https://github.com/pytorch/pytorch/issues/77886

jerjer1223 commented 2 years ago

Also it seems to have problems with the CPU as well with PyTorch 1.13. When I ran it under CPU, it gave me this error.

PyTorch version 1.13.0.dev20220607 Torchvision version 0.14.0a0+f9f721d

RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

GerardWalsh commented 2 years ago

@jerjer1223 try torchvision 0.14.0.dev20220603, with that torch version (1.13.0.dev20220607) that you're using.

Symbadian commented 2 years ago

@jerjer1223 good news πŸ˜ƒ! Your original issue may now be fixed βœ… in PR #8121. This PR removes MPS from the torch.device() map_location argument which appears to be the original source of the issue.

To receive this update:

* **[Git](https://github.com/ultralytics/yolov5)** – `git pull` from within your `yolov5/` directory or `git clone https://github.com/ultralytics/yolov5` again

* **[PyTorch Hub](https://pytorch.org/hub/ultralytics_yolov5/)** – Force-reload `model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)`

* **[Notebooks](https://github.com/ultralytics/yolov5/blob/master/tutorial.ipynb)** – View updated notebooks  [![Open In Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb) [![Open In Kaggle](https://camo.githubusercontent.com/a08ca511178e691ace596a95d334f73cf4ce06e83a5c4a5169b8bb68cac27bef/68747470733a2f2f6b6167676c652e636f6d2f7374617469632f696d616765732f6f70656e2d696e2d6b6167676c652e737667)](https://www.kaggle.com/ultralytics/yolov5)

* **[Docker](https://hub.docker.com/r/ultralytics/yolov5)** – `sudo docker pull ultralytics/yolov5:latest` to update your image [![Docker Pulls](https://camo.githubusercontent.com/280faedaf431e4c0c24fdb30ec00a66d627404e5c4c498210d3f014dd58c2c7e/68747470733a2f2f696d672e736869656c64732e696f2f646f636b65722f70756c6c732f756c7472616c79746963732f796f6c6f76353f6c6f676f3d646f636b6572)](https://hub.docker.com/r/ultralytics/yolov5)

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 πŸš€!

Hi @glenn-jocher and good day to you,

I am having a challenge where I am trying to use my M1 GPU for training via python3 train.py etc..

I have been trying to implement this for some time now by googling but this seems a little more challenging than I expected it to be.

I discovered the MPS component but that based on my research is used when deploying inference or detect.py. I can be wrong based on my limited experience. Can you guide me as to how to install the M1 GPU Silicon Chip on the new Macbook Pro for YOLOV5 Training, Please?

This training procedure is extremely painful on my old mac so I bought a newer model to handle the processing and I'm not sure how this works. Thanx loads for the YOLOV5 approach and your efforts. This is working on my old mac but that has been training since last Friday morning 3am and today is Tuesday, 16th Aug 2021 and it's now gotten to 8 epochs out of 30??!!!?!??!

Please help me initiate this faster with the M1 GPU or MPS not sure how it goes nevertheless my googling.

Thanx loads for anyone responding to my limitation, I am grateful just to learn

glenn-jocher commented 2 years ago

@Symbadian MPS support is in place currently for YOLOv5, but PyTorch has not completed sufficient support for MPS training.

If you have an M1/M2 machine you'll already see faster inference and training vs Intel chips simply by installing Python with Universal2 installers for python>=3.9. The speedup is about 200ms Intel vs 70ms M1 with universal2. MPS support would theoretically be faster still when available from pytorch.

Symbadian commented 2 years ago

Ok @glenn-jocher, so this would not work for MPS just yet, wow! DISAPPOINTED.....

  1. Ok, so would this be the reason why python is hogging all of my memory and Causing my terminal to freeze???!!!?? MAKING THE ENTIRE ops a pain???
  2. IT'S POINTLESS to try the n6,s6,m6,l6 and x6 models to increase the image size from 640 to 1280 for small objects in the scenery, I am constantly running out of memory and it's challenging me to understanding why.

This is taking 3-5 days to train the (yolov5m, l and x ) model: 30 epochs, 32 batch size, I tried implementing the --hyp low, med, high and testing these (FROM SCRATCH and PRE-TRAINED WEIGHTS) to see which is superior in performance for my solution. Every time I try implementing a larger model than the (m), I get the prompt below..???!!

THANX LOADS @glenn-jocher FOR YOUR Works, really appreciate this, I'm just trying to get this to work and understand what I am doing!!

dd4031678f8ba6bc24413a6257e458b36f5932a62839b2b735e9b0fe3842e095 IMG_2722 ?

glenn-jocher commented 2 years ago

@Symbadian you can track (and vote on) ongoing aten operator development in https://github.com/pytorch/pytorch/issues/77764 that's needed for full MPS training to work correctly.

kulinseth commented 2 years ago

Ok @glenn-jocher, so this would not work for MPS just yet, wow! DISAPPOINTED.....

  1. Ok, so would this be the reason why python is hogging all of my memory and Causing my terminal to freeze???!!!?? MAKING THE ENTIRE ops a pain???
  2. IT'S POINTLESS to try the n6,s6,m6,l6 and x6 models to increase the image size from 640 to 1280 for small objects in the scenery, I am constantly running out of memory and it's challenging me to understanding why.
  • The reason for trying to implement MPS or GPU via my TERMINAL
  • on My new MONTEREY 12.5ver MBP M1 Max 2021 64GB 32-CORES, is to SPEED UP THE Training...

This is taking 3-5 days to train the (yolov5m, l and x ) model: 30 epochs, 32 batch size, I tried implementing the --hyp low, med, high and testing these (FROM SCRATCH and PRE-TRAINED WEIGHTS) to see which is superior in performance for my solution. Every time I try implementing a larger model than the (m), I get the prompt below..???!!

  • I just got the training results today for the (m) model and my poor computer has been running Since Friday last 3am-to now???!!!
  • Can something be done here??
  • if yes please guide me to an example??

THANX LOADS @glenn-jocher FOR YOUR Works, really appreciate this, I'm just trying to get this to work and understand what I am doing!!

dd4031678f8ba6bc24413a6257e458b36f5932a62839b2b735e9b0fe3842e095 IMG_2722 ?

Hi @Symbadian , can you please file an issue in PyTorch with "MPS" label, we will take a look.

Symbadian commented 2 years ago

Hi @kulinseth how do I do so? I’ve never file an issue before and would like to have the most productive Impact to help others as well.

I am still struggling with this challenge, no matter what I do all of the resources are being drained and currently, Googling is not providing a solution.. please guide me

DenisVieriu97 commented 2 years ago

Hi @Symbadian - to file a PyTorch issue, you can go to https://github.com/pytorch/pytorch/issues and click on the green button New Issue (nearby the search bar). From there select Bug Report and please add the necessary info to reproduce it (e.g command line used, machine config info, pytorch version). In the labels tab, please add module: mps - we'll take a look from there. Thanks!

Symbadian commented 2 years ago

Hi Denis,

Thank you for acknowledging my digital presence, the reports has been logged!

From: Denis Vieriu @.> Date: Tuesday, 23 August 2022 at 21:05 To: ultralytics/yolov5 @.> Cc: Symbadian @.>, Mention @.> Subject: Re: [ultralytics/yolov5] Torch MPS (gpu) acceleration not working M1 Mac. (Issue #8102)

Hi @Symbadianhttps://github.com/Symbadian - to file a PyTorch issue, you can go to https://github.com/pytorch/pytorch/issues and click on the green button New Issue (nearby the search bar). From there select Bug Report and please add the necessary info to reproduce it (e.g command line used, machine config info, pytorch version). In the labels tab, please add module: mps - we'll take a look from there. Thanks!

β€” Reply to this email directly, view it on GitHubhttps://github.com/ultralytics/yolov5/issues/8102#issuecomment-1224778207, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL7WSHIWURMTYP26MOJGVL3V2UVJTANCNFSM5X4MJC6Q. You are receiving this because you were mentioned.Message ID: @.***>