Closed jlindsey15 closed 4 years ago
Hi there,
Thanks for testing this out!
What I usually do is resize all of imagenet to 256x256
and then use augmentations of 224x224
for training and 224x224
center crops for testing. The error you are seeing is due to an image having a dimension of 140 which is less than the . Your options are:
Let me know if you have any further issues
Thanks for the quick response! What modification to the transforms would be needed? I'm using standard ImageNet images and haven't had this difficulty with similar models (SimCLR, MOCO, etc.).
The official pytorch Moco works for you? The transforms are very similar; I'd suspect that if you see it here you would also see it in the Moco implementation
BYOL on left; Moco on right
Yeah, I've used the official pytorch Moco without modification. I can pinpoint the error to the following:
running:
temp = MultiAugmentImageFolder(path="path_to_imagenet", batch_size=512)
gives me the same error as above ("Got X and Y in dimension 2" where X and Y vary from run to run)
But running:
temp = torchvision.datasets.ImageFolder("path_to_imagenet")
works fine.
Is it possible to replace the MultiAugmentImageFolder class in your code with the standard Pytorch ImageFolder?
MultiAugmentImageDataset
uses ImageFolder
and is about as barebones of an implementation as you can get to do multiple augmentations. It already inherits from torchvision.datasets.ImageFolder
MultiAugmentImageFolder
simple builds the torchvision dataset, adds the transforms and wraps it in a torch data loader. If you complete your example, you get the same error with torch on non-resized imagenet:
pytorch = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder("path_to_imagenet", transform=torchvision.transforms.ToTensor()), batch_size=32, num_workers=4)
pytorch.__iter__().__next__()
This results in the following in pytorch 1.5.1
on py37
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/jramapuram/.venv/envs/pytorch1.5-py37/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/jramapuram/.venv/envs/pytorch1.5-py37/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/jramapuram/.venv/envs/pytorch1.5-py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/jramapuram/.venv/envs/pytorch1.5-py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/home/jramapuram/.venv/envs/pytorch1.5-py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 250, 250] at entry 0 and [3, 150, 200] at entry 1
Yeah, that's true -- but including
transforms.RandomResizedCrop(224, scale=(0.08, 1.))
in the dataset transforms fixes the issue when you use torchvision.datasets.ImageFolder, whereas it doesn't seem to be helping using MultiAugmentImageFolder.
I don't want to cause a hassle for you, I will do my best to figure it out!
I don't want to cause a hassle for you, I will do my best to figure it out!
No worries at all; glad to have someone test it out :)
transforms.RandomResizedCrop(224, scale=(0.08, 1.))
Yup, same applies for MultiAugmentImageFolder
:
In [17]: temp = MultiAugmentImageFolder(path="path_to_imagenet_root", batch_size=32, train_transform=[torchvision.transforms.RandomResizedCrop((224,224)), torchvision.transforms.ToTensor()])
...:
dataset loader: {'num_workers': 2, 'pin_memory': True, 'worker_init_fn': None, 'timeout': 0, 'drop_last': True}
train = 1281167 | test = 50000 | valid = 0
derived image shape = [3, 224, 224]
derived output size = 1000
Didn't error out for me (this was non-resized imagenet).
Weird, when I run exactly the same code
temp = MultiAugmentImageFolder(path="path_to_imagenet_root", batch_size=32, train_transform=[torchvision.transforms.RandomResizedCrop((224,224)), torchvision.transforms.ToTensor()])
I get the error. Do you think the pytorch / torchvision versions could be relevant? I'm using PyTorch 1.1.0 and TorchVision 0.3.0
EDIT: I just replicated on PyTorch 1.5.0 and torchvision 0.6.0
Interesting, might be worth a shot in a fresh conda env (I have tested with py37 on pytorch1.5 and pytorch1.5.1), but before you do that can you verify that you followed the README.md and have a ‘train’ and ‘test’ folder in your imagenet directory? You can also just create a symlink from ‘val’ to ‘test’. I doubt its the later issue because the error appears on a concatenation, but just want to be sure :)
On Mon, Jun 22, 2020 at 5:43 AM Jack Lindsey notifications@github.com wrote:
Weird, when I run exactly the same code
temp = MultiAugmentImageFolder(path="path_to_imagenet_root", batch_size=32, train_transform=[torchvision.transforms.RandomResizedCrop((224,224)), torchvision.transforms.ToTensor()])
I get the error. Do you think the pytorch / torchvision versions could be relevant? I'm using PyTorch 1.1.0 and TorchVision 0.3.0
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jramapuram/BYOL/issues/1#issuecomment-647254741, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6TEB7EN6ZGA7WPBATTNF3RX3HL3ANCNFSM4OECPGRA .
Yeah I made a symlink with "train" and "test." Fresh conda env with py35 and pytorch 1.5.1 has the same issue, alas :(
(though the wording of the error message is a bit different: "RuntimeError: stack expects each tensor to be equal size, but got [3, 375, 500] at entry 0 and [3, 342, 500] at entry 2")
I think I've solved the issue! I have a follow up question if you don't mind answering. It arose from the getitem method:
def __getitem__(self, index):
"""Label is the same for index, so just run augmentations again."""
sample0, target = self.__getitem_non_transformed__(index)
samples = [sample0] + [super(MultiAugmentImageDataset, self).__getitem__(index)[0]
for _ in range(self.num_augments)]
return samples + [target]
self.getitem_non_transformed was not reshaping images to 224x224, hence the stacking errors. By setting the non_augment_transform, I resolved the issue.
However, this produced another error downstream in the code. The block above returns a list like [unaugmented, augmentation1, augmentation2, label]. But the rest of the code is set up to receive a list of [augmentation1, augmentation2, label], leading to a "too many values to unpack (expected 3)" error when you start iterating through the train_loader. I can fix this issue by changing the line above:
samples = [sample0] + [super(MultiAugmentImageDataset, self).__getitem__(index)[0]
for _ in range(self.num_augments)]
to
samples = [super(MultiAugmentImageDataset, self).__getitem__(index)[0]
for _ in range(self.num_augments)]
Is this the correct thing to do, or am I missing something?
If you clone via git clone --recursive git+ssh://git@github.com/jramapuram/BYOL.git
as per the README.md
you wont have this error. The entire point is git submodules is to tightly couple dependencies, so BYOL is coupled with commit 7c5d0d9
from datasets.
Could you add a note here as to what fixed your original bug? Would be useful for tracking. Feel free to open another issue if you have problems.
Thanks, this (downloading the correct versions of the dependency repos) fixed the issue!
Hi! Thanks for this code. I'm getting the following error when trying to run it. Any idea what might be happening? Thank you!
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/jwl2182/.conda/envs/py36/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/share/ctn/users/jwl2182/BYOL/main.py", line 744, in run loader, model, grapher = build_loader_model_grapher(args) # build the model, loader and grapher File "/share/ctn/users/jwl2182/BYOL/main.py", line 419, in build_loader_model_grapher loader = get_loader(loader_dict) File "/share/ctn/users/jwl2182/BYOL/datasets/loader.py", line 175, in get_loader kwargs) File "/share/ctn/users/jwl2182/BYOL/datasets/imagefolder.py", line 132, in init train_samples_and_labels = self.train_loader.iter().next() File "/home/jwl2182/.conda/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in next return self._process_next_batch(batch) File "/home/jwl2182/.conda/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/home/jwl2182/.conda/envs/py36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/jwl2182/.conda/envs/py36/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 68, in default_collate return [default_collate(samples) for samples in transposed] File "/home/jwl2182/.conda/envs/py36/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 68, in
return [default_collate(samples) for samples in transposed]
File "/home/jwl2182/.conda/envs/py36/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 43, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 479 and 140in dimension 2 at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/TH/generic/THTensor.cpp:711