pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.25k stars 6.95k forks source link

Among emnist dataset splits, there are cases where labels do not match or out of list occurs. #2630

Closed kh-mo closed 4 years ago

kh-mo commented 4 years ago

πŸ› Bug

In the same code,

To Reproduce

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

def show_img_with_gt(i):
    batch_size=16
    dset_tr = torchvision.datasets.EMNIST(root="./data", split=i, download=True, train=True, 
                                      transform=transforms.Compose([lambda img: torchvision.transforms.functional.rotate(img, -90),
                                                                    transforms.RandomHorizontalFlip(p=1),
                                                                    transforms.ToTensor()]))
    dset_loader = torch.utils.data.DataLoader(dset_tr, batch_size=batch_size)
    i, (image, label) = next(enumerate(dset_loader))
    imshow(torchvision.utils.make_grid(image))
    print('GroundTruth: ', ' '.join('%5s' % dset_tr.classes[label[j]] for j in range(batch_size)))

show_img_with_gt("byclass")
show_img_with_gt("bymerge")
show_img_with_gt("balanced")
show_img_with_gt("letters")
show_img_with_gt("digits")
show_img_with_gt("mnist")

image

image

image

image

image

image

cc @pmeier

pmeier commented 4 years ago

HI @kh-mo thanks for the report. I'll look into it, but it might take a while.

pmeier commented 4 years ago

I'm pretty covered right now. Any help on this is appreciated.