microsoft / otdd

Optimal Transport Dataset Distance
MIT License
151 stars 48 forks source link

Why the same datasets otdd is not zero ? #14

Closed peterdarkdarkgogo closed 2 years ago

peterdarkdarkgogo commented 2 years ago

I tried to measure the same dataset otdd, but the result told me that it's not zero.

loaders_src  = load_torchvision_data('MNIST', valid_size=0, resize = 28, maxsize=2000)[0]
loaders_tgt  = load_torchvision_data('MNIST',  valid_size=0, resize = 28, maxsize=2000)[0]

The result is 346.16

dmelis commented 2 years ago

Two possible reasons: (1) the maxsize flag yields a data loader that randomly subsamples the dataset, so these two loaders will return datasets of size 2000 that most likely do not have the same samples, hence the non-zero distance (2) if you are not using the debiased verstion (debiased=True flag), then there is an entropy regularization term in the OT distance that biases the result away from zero.

peterdarkdarkgogo commented 2 years ago

Thank you David!