victoresque / pytorch-template

PyTorch deep learning projects made easy.
MIT License
4.75k stars 1.09k forks source link

i face on error when extend dataset class. How extend to other dataset like voc? #46

Closed EzoBear closed 5 years ago

EzoBear commented 5 years ago

I change one line two case in data loader. first case: Before self.dataset = datasets.MNIST(self.data_dir, train=training, download=True, transform=trsfm) After self.dataset = datasets.MNIST(self.data_dir, train=training, download=True, transform=trsfm, target_transform=trsfm) second case: Before origin code After self.dataset = datasets.VOCDetection(self.data_dir, year='2012', image_set='trainval', download=True, transform=trsfm, target_transform=trsfm)

but i have been facing on similar error on 2 case.

image

may be my think is can't convert data loader as enumerate variance... image

i wonder my guess is right and how extends mnist data loader code to voc dataloader?

SunQpark commented 5 years ago

transforms of torchvision are only for images, but you are trying to apply it also on targets(labels). So, remove , target_transform=trsfm part.

EzoBear commented 5 years ago

data loader have been extended to available to using target transform. image return value of dataloader's get item is image,target. target is [cls_name,[xmin, ymin, xmax, ymax]] but i face on other error. image i think mnist data is gary but voc data is rgb image. so first rank is not same. mnist is [1,x,y] but voc is [3,x,y] where should i look??

ps.... when i use torchvision voc dataset loader, i was face on type error. and sorry for inconveniencing to you. thank you.

SunQpark commented 5 years ago

Since dataset only returns single image and target, data_loader calls collate internally to stack them together into a batch. For mnist, image from dataset has size (1, 28, 28) and default functions stacks n images to make (n, 1, 28, 28) batch. But since VOC images have various shape(3, ?, ?) so you should resize or crop images into same size(use transforms.Resize(224), for example).

But sadly, this is not your last error to encounter, since targets also have various size in detection task. All I can say is that you will need to use collate_fn argument to fix that. You'd netter ask Google or Stackoverflow how to do that.