Yolo-NAS geeting RuntimeError: Trying to resize storage that is not resizable

pawani2v commented 1 year ago

Search before asking

[X] I have searched the Roboflow Notebooks issues and found no similar bug report.

Notebook name

train-yolo-nas-on-custom-dataset.ipynb

Bug

Trying to fine-tune Yolo-NAS on a custom dataset, when built dataloaders and trying to get a batch from it i'm getting error:

return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map) File "/home/i2v/.virtualenvs/pytorch-1.13.1-cu118/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensorfn out = elem.new(storage).resize(len(batch), *list(elem.size())) RuntimeError: Trying to resize storage that is not resizable

Dataloaders creation code:

from super_gradients.training import dataloaders
from super_gradients.training.datasets import YoloDarknetFormatDetectionDataset
from super_gradients.training.transforms.transforms import DetectionPaddedRescale, DetectionHorizontalFlip, DetectionRandomAffine
from torchvision.transforms import ToTensor, ToPILImage

train_transforms = [
    DetectionPaddedRescale(input_dim=(640,640)),
    #DetectionHorizontalFlip(0.5),
    #DetectionRandomAffine(target_size=(640,640))
]

val_transforms = [DetectionPaddedRescale(input_dim=(640,640))]

train_dataset = YoloDarknetFormatDetectionDataset(data_dir=DATA_DIR, images_dir=TRAIN_IMAGES_DIR, labels_dir=TRAIN_LABELS_DIR, classes=CLASSES, transforms=train_transforms)
val_dataset = YoloDarknetFormatDetectionDataset(data_dir=DATA_DIR, images_dir=VAL_IMAGES_DIR, labels_dir=VAL_LABELS_DIR, classes=CLASSES, transforms=val_transforms)

train_dataloader = dataloaders.get(dataset=train_dataset, dataloader_params={"batch_size":BATCH_SIZE, "shuffle": False, "pin_memory": False,
                                                                               "num_workers": 2, "drop_last": False})#, "collate_fn": collate_wrapper})

val_dataloader = dataloaders.get(dataset=val_dataset, dataloader_params={"batch_size": 16})#, "collate_fn": collate_wrapper})

Visualization code:

import matplotlib.pyplot as plt
import numpy as np
import torchvision.utils as vutils

%matplotlib inline
# Plot some training images
real_batch = next(iter(train_dataloader))
plt.figure(figsize=(8,8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(DEVICE)[:64], padding=2, normalize=True).cpu(),(1,2,0)))

If i set the num_workes to 0 in train_dataloader for train_dataloader i receive a different error:

--> 163 return torch.stack(batch, 0, out=out) 164 165 RuntimeError: stack expects each tensor to be equal size, but got [2, 5] at entry 0 and [1, 5] at entry 1

Environment

Environment: local
OS: Ubuntu 20.04
super-gradients: 3.1.2
torch: 1.13.1
python: 3.8.10

Minimal Reproducible Example

import torch

DEVICE = 'cuda' if torch.cuda.is_available() else "cpu"

DATA_DIR = "training_data/Yolo"
CLASSES = ["PersonInCar"]

TRAIN_IMAGES_DIR = "training_data/Yolo/train/images"
TRAIN_LABELS_DIR = "training_data/Yolo/train/labels"

VAL_IMAGES_DIR = "training_data/Yolo/val/images"
VAL_LABELS_DIR = "training_data/Yolo/val/labels"

BATCH_SIZE = 4

from super_gradients.training import dataloaders
from super_gradients.training.datasets import YoloDarknetFormatDetectionDataset
from super_gradients.training.transforms.transforms import DetectionPaddedRescale, DetectionHorizontalFlip, DetectionRandomAffine
#from torchvision.transforms import ToTensor, ToPILImage

train_transforms = [
    DetectionPaddedRescale(input_dim=(640,640)),
    #DetectionHorizontalFlip(0.5),
    #DetectionRandomAffine(target_size=(640,640))
]

val_transforms = [DetectionPaddedRescale(input_dim=(640,640))]

train_dataset = YoloDarknetFormatDetectionDataset(data_dir=DATA_DIR, images_dir=TRAIN_IMAGES_DIR, labels_dir=TRAIN_LABELS_DIR, classes=CLASSES, transforms=train_transforms)
val_dataset = YoloDarknetFormatDetectionDataset(data_dir=DATA_DIR, images_dir=VAL_IMAGES_DIR, labels_dir=VAL_LABELS_DIR, classes=CLASSES, transforms=val_transforms)

train_dataloader = dataloaders.get(dataset=train_dataset, dataloader_params={"batch_size":BATCH_SIZE, "shuffle": False, "pin_memory": False,
                                                                               "num_workers": 0, "drop_last": False})#, "collate_fn": collate_wrapper})

val_dataloader = dataloaders.get(dataset=val_dataset, dataloader_params={"batch_size": 16})#, "collate_fn": collate_wrapper})

import matplotlib.pyplot as plt
import numpy as np
import torchvision.utils as vutils

#%matplotlib inline
# Plot some training images
real_batch = next(iter(train_dataloader))
plt.figure(figsize=(8,8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(DEVICE)[:64], padding=2, normalize=True).cpu(),(1,2,0)))

Additional

Link to data

Are you willing to submit a PR?

[X] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 1 year ago

👋 Hello @pawani2v, thank you for leaving an issue on Roboflow Notebooks.

🐞 Bug reports

If you are filing a bug report, please be as detailed as possible. This will help us more easily diagnose and resolve the problem you are facing. To learn more about contributing, check out our Contributing Guidelines.

If you require support with custom code that is not part of Roboflow Notebooks, please reach out on the Roboflow Forum or on the GitHub Discussions page associated with this repository.

💬 Get in touch

Do you have more questions about Roboflow that we haven't responded to yet? Feel free to ask them on the Roboflow Discuss forum. Our developer advocates and community team actively respond to questions there.

To ask questions about Notebooks, head over to the GitHub Discussions section of this repository.

SkalskiP commented 1 year ago

Hi @pawani2v 👋🏻! It looks like this error is happening inside the YOLO-NAS codebase. We won't be able to help you with that. Please create an issue in https://github.com/Deci-AI/super-gradients. You can refer to this issue for more visibility.

roboflow / notebooks