microsoft / otdd

Optimal Transport Dataset Distance
MIT License
151 stars 48 forks source link

Possible to use otdd with coco dataset? #28

Closed just-eoghan closed 1 year ago

just-eoghan commented 1 year ago

Hello,

I am using a COCO style dataset.

from torchvision.datasets.vision import VisionDataset

class MyCocoDataset(VisionDataset)

OTDD fails in this block of code

    if hasattr(dataset, 'targets'): # most torchivision datasets
        targets = dataset.targets
    elif hasattr(dataset, '_data'): # some torchtext datasets
        targets = torch.LongTensor([e[0] for e in dataset._data])
    elif hasattr(dataset, 'tensors') and len(dataset.tensors) == 2: # TensorDatasets
        targets = dataset.tensors[1]
    elif hasattr(dataset, 'tensors') and len(dataset.tensors) == 1:
        logger.warning('Dataset seems to be unlabeled - this modality is in beta mode!')
        targets = None
    else:
        raise ValueError("Could not find targets in dataset.")

Raising a value error becasue the dataset doesn't have any of the checked attributes.

My dataset doesn't have targets, tensors etc. is it possible to use otdd?

It looks like the perfect tool for what I want to achieve!

dmelis commented 1 year ago

Hi Eoghan. If your dataset doesn't have labels / targets / classes, the right tool to use is the (vanilla) optimal transport. You can find implementations here and here.