Closed EzoBear closed 5 years ago
transforms of torchvision are only for images, but you are trying to apply it also on targets(labels).
So, remove , target_transform=trsfm
part.
data loader have been extended to available to using target transform. return value of dataloader's get item is image,target. target is [cls_name,[xmin, ymin, xmax, ymax]] but i face on other error. i think mnist data is gary but voc data is rgb image. so first rank is not same. mnist is [1,x,y] but voc is [3,x,y] where should i look??
ps.... when i use torchvision voc dataset loader, i was face on type error. and sorry for inconveniencing to you. thank you.
Since dataset only returns single image and target, data_loader
calls collate
internally to stack them together into a batch. For mnist, image from dataset has size (1, 28, 28) and default functions stacks n images to make (n, 1, 28, 28) batch. But since VOC images have various shape(3, ?, ?) so you should resize or crop images into same size(use transforms.Resize(224)
, for example).
But sadly, this is not your last error to encounter, since targets also have various size in detection task. All I can say is that you will need to use collate_fn
argument to fix that. You'd netter ask Google or Stackoverflow how to do that.
I change one line two case in data loader. first case: Before self.dataset = datasets.MNIST(self.data_dir, train=training, download=True, transform=trsfm) After self.dataset = datasets.MNIST(self.data_dir, train=training, download=True, transform=trsfm, target_transform=trsfm) second case: Before origin code After self.dataset = datasets.VOCDetection(self.data_dir, year='2012', image_set='trainval', download=True, transform=trsfm, target_transform=trsfm)
but i have been facing on similar error on 2 case.
may be my think is can't convert data loader as enumerate variance...
i wonder my guess is right and how extends mnist data loader code to voc dataloader?