Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage Medians of Means (MoM) estimators to robustify the estimation of Wasserstein distance. Exploiting the dual Kantorovitch formulation of Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. These MoM estimators enable to make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST. Eventually, we discuss how to combine MoM with the entropy-regularized approximation of the Wasserstein distance and propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
URL: http://arxiv.org/abs/2006.10325v1