Slightly biased results of Sinkhorn divergence

Hi, I'm using this code for density estimation and generative modelling. I think Sinkhorn divergence is perfect for such tasks because of it unbiased nature. However, in practice, I find it is slight biased.

Basically, I have several datapoints and a Gaussian distribution. I want to match them so the datapoints can be regarded as samples from the Gaussian distribution. This is a basic setting in generative modelling. Ideally, this will work well because Sinkhorn divergence gets rid of entropic bias of regularized W-distance by adding two self correlation terms. However, the results are still biased.

I present my simple test code here.


import os
import time

import numpy as np
import torch

from geomloss import SamplesLoss

eps = 0.1
sample = 2000
lr = 0.001
epoch = 400
p = 1

# Synthesis data points.
device = 'cuda'
tdtype = torch.float

rng = np.random.RandomState(0)
Train_all = rng.randn(1000, 2)
train_tensor = torch.tensor(Train_all, device=device, dtype=tdtype).requires_grad_(True)

d=Train_all.shape[1]

# Gaussian model
class Gaussian():
    def __init__(self, Sample_n, u, sigma, eps=0.5, p=1, tdtype=torch.float, device='cuda'):

        self.u = torch.tensor(u, requires_grad=True, device=device, dtype=tdtype)
        self.sigma = torch.tensor(sigma, requires_grad=True, device=device, dtype=tdtype)

        self.L_ab = SamplesLoss('energy', potentials=False, backend='tensorized', scaling=0.9)
        # self.L_ab = SamplesLoss('sinkhorn', p=p, blur=eps, debias=True, potentials=False, backend='tensorized', scaling=0.9)
        # self.L_ab = SamplesLoss('sinkhorn', p=p, blur=eps, debias=False, potentials=False, backend='tensorized', scaling=0.9)

        self.dim = self.u.shape[-1]

        self.n1 = Sample_n 
        self.eps = eps
        self.tdtype = tdtype
        self.device = device

    def logp_x(self, x):
        out = - self.dim / 2 * np.log(2 * np.pi) - torch.sum(torch.log(self.sigma), -1) - 0.5 * torch.sum((torch.unsqueeze(x, 0) - self.u)** 2 / (self.sigma ** 2), -1)
        return out       

    def Sample(self):
        # with torch.no_grad():
        Sample = torch.randn((1, self.n1, self.dim), device=self.device, dtype=self.tdtype)
        tmp = Sample * self.sigma + self.u
        out = torch.reshape(tmp, [-1, self.dim])
        return out

    def loss(self, D):

        x = self.Sample()
        out = self.L_ab(x, D)
        return out

# Init the Gaussian
alpha = np.ones(1) 
u_ = np.zeros((1, 1, d))
S_ = np.ones((1, 1, d))
model = Gaussian(sample, u_, S_, eps=eps, p=1)

a_opt = torch.optim.RMSprop([
                            {'params': model.u},
                            {'params': model.sigma}
                            ] , lr=lr, alpha=0.9)

p_opt = torch.optim.RMSprop([
                            {'params': train_tensor},
                            ] , lr=lr, alpha=0.9)

for epoch in range(epoch):

    lp = model.loss(train_tensor)

    # Uncomment to optimize datapoint
    # p_opt.zero_grad()
    # lp.backward()
    # p_opt.step()
    # print('Data mean:\t', train_tensor.detach().mean(0).cpu().numpy(), '\t Data std:\t', train_tensor.detach().std(0).cpu().numpy())
    # print('Gaussian mean:\t', model.u[:,0,:].detach().cpu().numpy(), '\t \t \t Gaussian std:\t', model.sigma.detach()[:,0,:].cpu().numpy())

    # Uncomment to optimize Gaussian dsitribution
    a_opt.zero_grad()
    lp.backward()
    a_opt.step()
    print('Data mean:\t', train_tensor.detach().mean(0).cpu().numpy(), '\t Data std:\t', train_tensor.detach().std(0).cpu().numpy())
    print('Gaussian mean:\t', model.u[:,0,:].detach().cpu().numpy(), '\t Gaussian std:\t', model.sigma.detach()[:,0,:].cpu().numpy())

I present several experimental facts of this code:

Sinkhorn divergence suffers from bias, and this bias decrease as \epsilon increase, in the limit, MMD works perfectly. When I fix the datapoint and learn the Gaussian distribution. The std of data is 0.97487277. With eps=0.01, 0.1, 0.5, 1, the learned stds of Gaussian are 0.96189475, 0.9575431, 0.96259254, 0.97405905, which are all smaller than the data std. With MMD (uncomment line 38), the learned std are near data std, it could be larger or smaller than than data std in different runs.
This is also true if I learn the data while fixed the Gaussian. Uncomment line 92-96. The std of Gaussian distribution is 1. With eps=0.01, 0.1, 0.5, 1, the data stds are 0.98536015, 0.9891638, 0.9949019, 0.9966444. With MMD, the learned data std is 1.0002198.

jeanfeydy / geomloss

Slightly biased results of Sinkhorn divergence #32