Albumentations - RuntimeError: stack expects each tensor to be equal size

1andDone commented 1 year ago

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

I made updates to the default albumentations transforms in yolov5/utils/augmentations.py using the code highlighted in red below:

When running the following in Google Colab:

!python train.py --img 640 --batch 32 --epochs 1 --data train_validation_test_split.yaml --weights yolov5m_Objects365.pt --nosave --cache

This is the error I receive:

train: weights=yolov5m_Objects365.pt, cfg=, data=train_validation_test_split.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=1, batch_size=32, imgsz=640, rect=False, resume=False, nosave=True, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-210-gdd10481 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=365 with nc=7

                 from  n    params  module                                  arguments                     
  0                -1  1      5280  models.common.Conv                      [3, 48, 6, 2, 2]              
  1                -1  1     41664  models.common.Conv                      [48, 96, 3, 2]                
  2                -1  2     65280  models.common.C3                        [96, 96, 2]                   
  3                -1  1    166272  models.common.Conv                      [96, 192, 3, 2]               
  4                -1  4    444672  models.common.C3                        [192, 192, 4]                 
  5                -1  1    664320  models.common.Conv                      [192, 384, 3, 2]              
  6                -1  6   2512896  models.common.C3                        [384, 384, 6]                 
  7                -1  1   2655744  models.common.Conv                      [384, 768, 3, 2]              
  8                -1  2   4134912  models.common.C3                        [768, 768, 2]                 
  9                -1  1   1476864  models.common.SPPF                      [768, 768, 5]                 
 10                -1  1    295680  models.common.Conv                      [768, 384, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  2   1182720  models.common.C3                        [768, 384, 2, False]          
 14                -1  1     74112  models.common.Conv                      [384, 192, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  2    296448  models.common.C3                        [384, 192, 2, False]          
 18                -1  1    332160  models.common.Conv                      [192, 192, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  2   1035264  models.common.C3                        [384, 384, 2, False]          
 21                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  2   4134912  models.common.C3                        [768, 768, 2, False]          
 24      [17, 20, 23]  1     48492  models.yolo.Detect                      [7, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Model summary: 291 layers, 20895564 parameters, 20895564 gradients, 48.3 GFLOPs

Transferred 475/481 items from yolov5m_Objects365.pt
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias
albumentations: HorizontalFlip(p=0.1), Rotate(p=0.1, limit=(-90, 90), interpolation=1, border_mode=1, value=None, mask_value=None, rotate_method='largest_box', crop_border=False), OneOf([
  Perspective(p=0.2, scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1),
  RandomCropFromBorders(p=0.2, crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1),
  ShiftScaleRotate(p=0.2, shift_limit_x=(-0.1, 0.1), shift_limit_y=(-0.1, 0.1), scale_limit=(-0.19999999999999996, 0.19999999999999996), rotate_limit=(0, 0), interpolation=1, border_mode=1, value=None, mask_value=None, rotate_method='largest_box'),
], p=0.1), OneOf([
  MedianBlur(p=0.25, blur_limit=(3, 7)),
  RandomBrightnessContrast(p=0.4, brightness_limit=(-0.2, 0.2), contrast_limit=(-0.2, 0.2), brightness_by_max=True),
  RandomSunFlare(p=0.1, flare_roi=(0, 0, 1, 0.5), angle_lower=0, angle_upper=1, num_flare_circles_lower=1, num_flare_circles_upper=4, src_radius=400, src_color=(255, 255, 255)),
], p=0.1)
train: Scanning /content/train_validation_test_split/labels/train.cache... 233 images, 24 backgrounds, 0 corrupt: 100% 233/233 [00:00<?, ?it/s]
train: Caching images (0.2GB ram): 100% 233/233 [00:15<00:00, 14.61it/s]
val: Scanning /content/train_validation_test_split/labels/val.cache... 66 images, 5 backgrounds, 0 corrupt: 100% 66/66 [00:00<?, ?it/s]
val: Caching images (0.1GB ram): 100% 66/66 [00:05<00:00, 12.20it/s]

AutoAnchor: 3.02 anchors/target, 0.998 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅
Plotting labels to runs/train/exp6/labels.jpg... 
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/train/exp6
Starting training for 1 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
  0% 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/content/yolov5/train.py", line 647, in <module>
    main(opt)
  File "/content/yolov5/train.py", line 536, in main
    train(opt.hyp, opt, device, callbacks)
  File "/content/yolov5/train.py", line 291, in train
    for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/content/yolov5/utils/dataloaders.py", line 172, in __iter__
    yield next(self.iterator)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/content/yolov5/utils/dataloaders.py", line 891, in collate_fn
    return torch.stack(im, 0), torch.cat(label, 0), path, shapes
RuntimeError: stack expects each tensor to be equal size, but got [3, 640, 640] at entry 0 and [3, 550, 579] at entry 19

When I comment out the pixel-level transformations, everything runs smoothly.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 1 year ago

👋 Hello @1andDone, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

1andDone commented 1 year ago

@SimoneGiovannini @florenttaralle @glenn-jocher

Hi, I found out some cropping transformations, like BBoxSafeRandomCrop or RandomCropFromBorders, raise an error of this kind:

RuntimeError: stack expects each tensor to be equal size, but got [3, 640, 640] at entry 0 and [3, 390, 212] at entry 2

I think this is due to the fact that augmentations are applied after the image resizing done by the algorithm. So to solve the problem one should augment before resizing, and it looks like such a change would require some work.

Hi, you are right ! But this is by design. I should have explicited this point. The sizing/cropping are managed by some complex behaviors in yolov5 code (like mosaic) The additionnal transforms you can add MUST NOT modify crop shape.

I saw your discussion in PR https://github.com/ultralytics/yolov5/pull/9628 that was related to the issue I described above. Are there currently ways to implement augmentations that re-size the image in YOLOv5?

github-actions[bot] commented 1 year ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

glenn-jocher commented 10 months ago

@1andDone hello! Yes, you are correct. As mentioned, the additional transforms should not modify the crop shape. Resizing the image before applying augmentations is necessary to avoid the issue you encountered. You can implement augmentations that resize the image by using libraries such as albumentations or OpenCV before feeding the image into YOLOv5. Let me know if you need further assistance with the implementation.

ultralytics / yolov5