Among emnist dataset splits, there are cases where labels do not match or out of list occurs.

kh-mo commented 4 years ago

🐛 Bug

In the same code,

'byclass', 'bymerge', 'balanced' split do not match label & img.
'letters' split raises an out of list error.
'digits', 'mnist' split work well.

To Reproduce

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

def show_img_with_gt(i):
    batch_size=16
    dset_tr = torchvision.datasets.EMNIST(root="./data", split=i, download=True, train=True, 
                                      transform=transforms.Compose([lambda img: torchvision.transforms.functional.rotate(img, -90),
                                                                    transforms.RandomHorizontalFlip(p=1),
                                                                    transforms.ToTensor()]))
    dset_loader = torch.utils.data.DataLoader(dset_tr, batch_size=batch_size)
    i, (image, label) = next(enumerate(dset_loader))
    imshow(torchvision.utils.make_grid(image))
    print('GroundTruth: ', ' '.join('%5s' % dset_tr.classes[label[j]] for j in range(batch_size)))

show_img_with_gt("byclass")
show_img_with_gt("bymerge")
show_img_with_gt("balanced")
show_img_with_gt("letters")
show_img_with_gt("digits")
show_img_with_gt("mnist")

cc @pmeier

pmeier commented 4 years ago

HI @kh-mo thanks for the report. I'll look into it, but it might take a while.

pmeier commented 4 years ago

I'm pretty covered right now. Any help on this is appreciated.

pytorch / vision

Among emnist dataset splits, there are cases where labels do not match or out of list occurs. #2630

🐛 Bug

To Reproduce