autograd support? - Githubissues

Yura52 commented 3 years ago

Hi! In the website it is said: "full support of PyTorch’s autograd engine".

However, in the function sinkhorn_loop there is a line in the beginnig:

torch.autograd.set_grad_enabled(False)

And the gradients are turned on again only in the end.

So, as I see it, I cannot differentiate through the solution of the Optimal Transport problem (i.e. through the "flows between suppliers and demanders"). Is it correct?

The reason why I am asking is the presence of the library qpth which enables differentiation through the solution. See this example. And my goal is to pass (alpha, x, beta, y) as input where all four tensors require gradients (they are "Variables" in terms of older versions of PyTorch). So I am trying to understand if should just copy your implementation and comment the line above, or may be I am missing something...

NightWinkle commented 3 years ago

It can differentiate through the solution of the Optimal Transport problem.

As you can read in the litterature, for instance in Interpolating between Optimal Transport and MMD using Sinkhorn Divergences, the gradient of Sinkhorn is actually equal to the gradient of one Sinkhorn iteration.
For this reason, it is more efficient to compute the gradient using only one of these Sinkhorn iterations.

The line you are mentioning just allows to disable computation of the autodifferentiation graph through the steps that are not needed to compute the gradient and would make backward pass quite slow.

As you can see tho, gradients are reactivated before the last iteration, allowing for the gradient to be computed.

Yura52 commented 3 years ago

Oh, I see, thank you for the answer!

jeanfeydy commented 3 years ago

Hi @Yura52 , @NightWinkle ,

Thanks a lot for your interest in the library and relevant questions!

As described by @NightWinkle and in Eqs. (3.226-3.227) of my PhD thesis, the current implementation allows you to backpropagate efficiently through the computation of the Sinkhorn loss: if you're interested in optimizing a (smooth) Wasserstein distance with respect to the weights alpha, beta and the sample locations x, y, you're good to go. This seems to be what you have in mind, which is good news.

Note, however, that you may encounter problems if you plan to do something a bit more exotic. For instance, some authors use the Sinkhorn algorithm to compute an optimal transport plan (as in e.g. this tutorial), and then backprop through a loss that is not a regularized Wasserstein distance. In this situation, the simplifications that I hardcoded into GeomLoss do not hold anymore: to retrieve the correct gradients, you should indeed backprop through the iterations of the Sinkhorn loop. In other words, comment the torch.autograd.set_grad_enabled(False) line or add some extra iterations at the end of the loop, in the final "extrapolation" step.

All these points are discussed in recent works by Pierre Ablin, such as this paper: going forward, I will certainly add a "switch" for this behaviour as an optional argument. Right now, I am mostly working on improving the low-level KeOps routines of GeomLoss and finalizing theoretical papers, but I will really push for a stable v1.0 release over the next few months.

I hope that this answers your question: feel free to re-open the issue if needed :-) Best regards, Jean

jeanfeydy / geomloss

autograd support? #40