Closed KronbergE closed 2 years ago
@KronbergE I think has been resolved in master a few weeks ago, can you verify you are seeing the problem in current master code?
The only place I find any remaining references to float64 in the repo is this line: https://github.com/ultralytics/yolov5/blob/628c05ca6ff1d7f79d1fc63c298008a1341ba99c/utils/dataloaders.py#L481
@KronbergE good news š! Your original issue may now be fixed ā in PR #8865. To receive this update:
git pull
from within your yolov5/
directory or git clone https://github.com/ultralytics/yolov5
againmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
sudo docker pull ultralytics/yolov5:latest
to update your image Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 š!
UPDATE
Thank you @glenn-jocher for your response as you must have a pretty packed schedule, it's much appreciated :D
I updated to the newest git repository and also created a new environment with the latest requirements. But now when trying to train with the MPS I instead get this error message.
!python train.py --device mps --img 640 --cfg /Users/myname/Desktop/yolov5/models/modifiedYolov5s.yaml --hyp /Users/myname/Desktop/yolov5/data/hyps/hyp.scratch-low.yaml --batch 16 --epochs 10 --data /Users/myname/Desktop/yolov5/data/pavementDistressDetectionSwedishData2.yaml --weights /Users/myname/Desktop/yolov5/runs/train/modDistressDetectorImprovedV0SwedishFixed/weights/best.pt --workers 8 --name modDistressDetectorImprovedV2SwedishData2
[34m[1mtrain:[0mweights=/Users/myname/Desktop/yolov5/runs/train/modDistressDetectorImprovedV0SwedishFixed/weights/best.pt, cfg=/Users/myname/Desktop/yolov5/models/modifiedYolov5s.yaml, data=/Users/myname/Desktop/yolov5/data/pavementDistressDetectionSwedishData2.yaml, hyp=/Users/myname/Desktop/yolov5/data/hyps/hyp.scratch-low.yaml, epochs=10, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=mps, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=modDistressDetectorImprovedV2SwedishData2, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
[34m[1mgithub: [0mup to date with https://github.com/ultralytics/yolov5 ā
YOLOv5 š v6.1-362-g731a2f8 Python-3.10.4 torch-1.13.0.dev20220804 MPS
[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected.
Overriding model.yaml nc=80 with nc=8
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 35061 models.yolo.Detect [8, [[19, 10, 51, 12, 31, 29], [134, 13, 79, 30, 60, 64], [291, 28, 130, 81, 197, 140]], [128, 256, 512]]
modifiedYolov5s summary: 270 layers, 7041205 parameters, 7041205 gradients, 16.0 GFLOPs
Transferred 348/349 items from /Users/myname/Desktop/yolov5/runs/train/modDistressDetectorImprovedV0SwedishFixed/weights/best.pt
/Users/myname/Desktop/yolov5/utils/general.py:833: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
x = x[xc[xi]] # confidence
[34m[1mAMP: [0mchecks failed ā, disabling Automatic Mixed Precision. See https://github.com/ultralytics/yolov5/issues/7908
Scaled weight_decay = 0.0005
[34m[1moptimizer:[0m SGD with parameter groups 57 weight (no decay), 60 weight, 60 bias
[34m[1mtrain: [0mScanning '/Users/myname/Desktop/yolov5/swedishData2/labels/train.ca[0m
[34m[1mval: [0mScanning '/Users/myname/Desktop/yolov5/swedishData2/labels/val.cache'[0m
Plotting labels to runs/train/modDistressDetectorImprovedV2SwedishData2/labels.jpg...
[34m[1mAutoAnchor: [0m4.23 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset ā
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1mruns/train/modDistressDetectorImprovedV2SwedishData2[0m
Starting training for 10 epochs...
Epoch gpu_mem box obj cls labels img_size
0%| | 0/59 [00:00<?, ?it/s] [34m[1mwandb[0m: Currently logged in as: [33merikyolo[0m. Use [1m`wandb login --relogin`[0m to force relogin
0%| | 0/59 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/Users/myname/Desktop/yolov5/train.py", line 634, in <module>
main(opt)
File "/Users/myname/Desktop/yolov5/train.py", line 529, in main
train(opt.hyp, opt, device, callbacks)
File "/Users/myname/Desktop/yolov5/train.py", line 310, in train
loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size
File "/Users/myname/Desktop/yolov5/utils/loss.py", line 125, in __call__
tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets
File "/Users/myname/Desktop/yolov5/utils/loss.py", line 208, in build_targets
t = t[j] # filter
NotImplementedError: The operator 'aten::index.Tensor_out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
And when adding PYTORCH_ENABLE_MPS_FALLBACK=1 to the command I instead get this error message.
!PYTORCH_ENABLE_MPS_FALLBACK=1 python train.py --device mps --img 640 --cfg /Users/myname/Desktop/yolov5/models/modifiedYolov5s.yaml --hyp /Users/myname/Desktop/yolov5/data/hyps/hyp.scratch-low.yaml --batch 16 --epochs 10 --data /Users/myname/Desktop/yolov5/data/pavementDistressDetectionSwedishData2.yaml --weights /Users/myname/Desktop/yolov5/runs/train/modDistressDetectorImprovedV0SwedishFixed/weights/best.pt --workers 8 --name modDistressDetectorImprovedV2SwedishData2
Traceback (most recent call last):
File "/Users/myname/Desktop/yolov5/train.py", line 634, in <module>
main(opt)
File "/Users/myname/Desktop/yolov5/train.py", line 529, in main
train(opt.hyp, opt, device, callbacks)
File "/Users/myname/Desktop/yolov5/train.py", line 310, in train
loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size
File "/Users/myname/Desktop/yolov5/utils/loss.py", line 125, in __call__
tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets
File "/Users/myname/Desktop/yolov5/utils/loss.py", line 208, in build_targets
t = t[j] # filter
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
@KronbergE yes this is expected. I would head over the pytorch issue mentioned and add your vote to the ops we need prioritized for conversion, aten::index.Tensor_out
The reference for float64 is there in autoanchor.
AutoAnchor: 2.80 anchors/target, 0.947 Best Possible Recall (BPR). Anchors are a poor fit to dataset ā ļø, attempting to improve... AutoAnchor: WARNING: Extremely small objects found: 410 of 8227 labels are < 3 pixels in size AutoAnchor: Running kmeans for 9 anchors on 8224 points... AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.8163: 100%|āāāāāāāāāā| 1000/1000 [00:00<00:00, 1252.70it/s] AutoAnchor: thr=0.25: 0.9993 best possible recall, 7.51 anchors past thr AutoAnchor: n=9, img_size=64, metric_all=0.477/0.816-mean/best, past_thr=0.536-mean: 4,3, 6,6, 10,7, 7,10, 12,11, 9,19, 15,17, 13,27, 17,25 Traceback (most recent call last): File "/Users/sarthakbansal/Desktop/ObjectDetection/yolov5/train.py", line 633, in <module> main(opt) File "/Users/sarthakbansal/Desktop/ObjectDetection/yolov5/train.py", line 529, in main train(opt.hyp, opt, device, callbacks) File "/Users/sarthakbansal/Desktop/ObjectDetection/yolov5/train.py", line 225, in train check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz) File "/Users/sarthakbansal/Desktop/ObjectDetection/yolov5/utils/autoanchor.py", line 58, in check_anchors anchors = torch.tensor(anchors, device=m.anchors.device).type_as(m.anchors) TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
@maverick-ai is this reproducible in current master? We used to have some float64 variables in YOLOv5 but they've all since been removed completely from the repo.
Yes, I am using the current master branch
@glenn-jocher The issue is still in yolov5/utils/autoanchor.py. I am pasting the screenshot of my terminal for your reference
@maverick-ai can you debug and see exactly what variable is float64?
@KronbergE @maverick-ai good news š! Your original issue may now be fixed ā in PR #9188. To receive this update:
git pull
from within your yolov5/
directory or git clone https://github.com/ultralytics/yolov5
againmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
sudo docker pull ultralytics/yolov5:latest
to update your image Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 š!
Search before asking
Question
I'm trying to utilize my Macbooks GPU when training my yolov5 model. I have already installed all the necessary things to utilize the GPU, I get the correct "('12.5', ('', '', ''), 'arm64')" and so on when calling on platform.mac_ver().
But when im executing my train command I get this error message:
!python train.py --device mps --img 640 --cfg /Users/myname/Desktop/yolov5/models/modifiedYolov5s.yaml --hyp /Users/myname/Desktop/yolov5/data/hyps/hyp.scratch.yaml --batch 32 --epochs 10 --data /Users/myname/Desktop/yolov5/data/pavementDistressDetectionSwedishData2.yaml --weights /Users/myname/Desktop/yolov5/runs/train/modDistressDetectorImprovedV0SwedishFixed/weights/best.pt --workers 8 --name modDistressDetectorImprovedV2SwedishData2
Traceback (most recent call last): File "/Users/myname/Desktop/yolov5/train.py", line 666, in
main(opt)
File "/Users/myname/Desktop/yolov5/train.py", line 561, in main
train(opt.hyp, opt, device, callbacks)
File "/Users/myname/Desktop/yolov5/train.py", line 285, in train
model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) * nc # attach class weights
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
I've searched around but cannot figure out how to change so that pytorch uses float32 instead of float64.
Does anyone have any tips of what I might try?
Additional
No response