vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.41k stars 182 forks source link

AttributeError in Data Augmentation (GaussianBlur) #243

Closed ChintanTrivedi closed 2 years ago

ChintanTrivedi commented 2 years ago

While training a simclr model with custom unlabled data, I am using the default data augmentations. However, randomly, the training would stop, throwing a data-fetch error because of some issue in the data augmentation step:-

x = x.filter(ImageFilter.GaussianBlur(radius=sigma))
AttributeError: 'Tensor' object has no attribute 'filter'

Note that the training loop works well for multiple epochs (this example crashed after 9 successful epochs). What could be the issue here? How could I narrow down if this issue is with pytorch_lightning, torchvision or solo-learn?

Epoch 10:  14%|██████████████████████████▍                                                                                                                                                                      | 218/1591 [01:13<07:40,  2.98it/s, loss=1.39, v_num=whyv]Traceback (most recent call last):
  File "main_pretrain.py", line 188, in <module>
    main()
  File "main_pretrain.py", line 184, in main
    trainer.fit(model, train_loader, val_loader, ckpt_path=ckpt_path)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
    self.epoch_loop.run(data_fetcher)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 156, in advance
    batch_idx, (batch, self.batch_progress.is_last_batch) = next(self._dataloader_iter)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 203, in __next__
    return self.fetching_function()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 270, in fetching_function
    self._fetch_next_batch()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 300, in _fetch_next_batch
    batch = next(self.dataloader_iter)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 550, in __next__
    return self.request_next_batch(self.loader_iters)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 562, in request_next_batch
    return apply_to_collection(loader_iters, Iterator, next)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 95, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1179, in _next_data
    return self._process_data(data)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/chintan/Downloads/SSL/solo-learn/solo/utils/pretrain_dataloader.py", line 46, in __getitem__
    data = super().__getitem__(index)
  File "/home/chintan/Downloads/SSL/solo-learn/solo/utils/pretrain_dataloader.py", line 62, in __getitem__
    x = self.transform(x)
  File "/home/chintan/Downloads/SSL/solo-learn/solo/utils/pretrain_dataloader.py", line 160, in __call__
    out.extend(transform(x))
  File "/home/chintan/Downloads/SSL/solo-learn/solo/utils/pretrain_dataloader.py", line 138, in __call__
    return [self.transform(x) for _ in range(self.num_crops)]
  File "/home/chintan/Downloads/SSL/solo-learn/solo/utils/pretrain_dataloader.py", line 138, in <listcomp>
    return [self.transform(x) for _ in range(self.num_crops)]
  File "/home/chintan/Downloads/SSL/solo-learn/solo/utils/pretrain_dataloader.py", line 171, in __call__
    return self.transform(x)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__
    img = t(img)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/chintan/PycharmProjects/solo-learn/venv/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 463, in forward
    img = t(img)
  File "/home/chintan/Downloads/SSL/solo-learn/solo/utils/pretrain_dataloader.py", line 96, in __call__
    x = x.filter(ImageFilter.GaussianBlur(radius=sigma))
AttributeError: 'Tensor' object has no attribute 'filter'
vturrisi commented 2 years ago

Hey, which script are you using? I just noticed that typing for that class is wrong, it should be def __call__(self, x: Image) -> Image: but it wouldn't disrupt actually using it.

ChintanTrivedi commented 2 years ago

Like you mentioned, changing the typing did not make any difference, still seeing the same error message AttributeError: 'Tensor' object has no attribute 'filter'. I used this script:-

python3 main_pretrain.py \
    --dataset custom \
    --backbone resnet18 \
    --data_dir <path> \
    --train_dir <images_directory> \
    --no_labels \
    --max_epochs 20 \
    --gpus 0 \
    --accelerator gpu \
    --precision 16 \
    --optimizer sgd \
    --lars \
    --grad_clip_lars \
    --eta_lars 0.02 \
    --exclude_bias_n_norm \
    --scheduler warmup_cosine \
    --lr 0.4 \
    --classifier_lr 0.1 \
    --weight_decay 1e-5 \
    --batch_size 32 \
    --num_workers 4 \
    --crop_size 224 \
    --brightness 0.8 \
    --contrast 0.8 \
    --saturation 0.8 \
    --hue 0.2 \
    --gaussian_prob 0.0 0.0 \
    --num_crops_per_aug 1 1 \
    --name <model_name>  \
    --project solo-learn \
    --entity <wandb_name> \
    --wandb \
    --save_checkpoint \
    --method simclr \
    --temperature 0.2 \
    --proj_hidden_dim 1024 \
    --proj_output_dim 128

Surprisingly, the dataloader doesn't throw this error when I train a bigger ResNet50 model (instead of 18). I see no relation between the model size and dataloader error, unless it has something to do with the training_step execution time and dataloader queue?

vturrisi commented 2 years ago

Yes. It wasn't supposed to fix your error, just to make the annotations correct. The issue is that somehow a tensor is being fed to the augmentations pipeline instead of the actual image. Can you print what is img before? https://github.com/vturrisi/solo-learn/blob/b4fbf788a59ede53631f301a7ba9285602948df9/solo/utils/pretrain_dataloader.py#L94 It would help me to debug.

vturrisi commented 2 years ago

Any update on this?

ChintanTrivedi commented 2 years ago

Sorry for the delay. I had to switch my experiments to ResNet50 and I have not encountered the issue with that setting. While the issue remains unsolved for ResNet18, I am not currently planning to work on that due to time constraints. I shall reopen the issue if I need to use it again. Thanks!