mrquincle / emd-suite

Earth Mover's Distance - Variants
0 stars 0 forks source link

Find generalized EMD metrics #1

Open mrquincle opened 4 years ago

mrquincle commented 4 years ago

Earth mover's distance is limited to two distributions where there is no concern for internal structure. For example, if we would consider a car being moved from A to B with EMD, it would physically deconstruct the entire car and transfer them atom by atom. This is in contrast with the real world in which we would load the car on a truck move it in its entirety to its destination and unload it there.

This limitation of EMD is very visible when for example transferring points of a point cloud that makes up a circle "object" towards another circle "object". The points will be transferred according to a global distance metric. The "local neighborhood distance" between points is not preserved.

We need to find generalized EMD metrics that take into account local structure.

mrquincle commented 4 years ago

In A Consistent Metric for Performance Evaluation of Multi-Object Filters, the authors introduce OSPA (optimal subpattern assignment) and argue that it is an improvement on OMAT (optimal mass transfer). Both methods are based on Wasserstein and address multi-object (tracking) problems. The first method is just the p-Wasserstein distance between two point clouds. The second method is also a p-Wasserstein distance, but it has a cut-off parameter, c. It assigns a maximum value to points that are left unassigned. Using this additional parameter the authors have a way to tradeoff localization errors with cardinality errors. By increasing c, cardinality errors receive a larger weight. By increasing the order parameter p, outliers are also penalized more (large weights become even more pronounced).

This adjusted metric is not what we need for establishing locality sensitivity. It establishes a way to cope with outlier points. It is not describing how to match a point cloud of 100 points with two point clouds of 50 points, where all of the point clouds are organized in spatially separated circles or squares. It is how hopeful though that additional properties are deemed important by the authors.

mrquincle commented 4 years ago

In Convolutional wasserstein distances the authors introduce an entropy-regularized form of the Wasserstein distance. Figure 2. clarifies this: you can see how the "crisp" transfer becomes "fuzzier" from left to right.

In our case, if we establish a transformation matrix between source and target point cloud, it means that each point is matched a little bit. This is definitely not what we want! Normal Wasserstein decomposes the car into parts, moves those parts and reorganizes it at the target side. With entropy-regularized Wasserstein we would grind the car to shreds and reassemble it from scratch on the target side. This is nevertheless an interesting paper, because it shows exactly the "opposite" of what we want to achieve! We do not necessarily want smooth transportation plans, we want sparse transportation plans. We should look into the literature on sparse optimization to find relevant work.

mrquincle commented 4 years ago

In Smooth and Sparse Optimal Transport we find optimal transport plans that are sparse and even group-sparse. See also this note on the group lasso and the sparse group lasso which also refers to this dissertation on multi-dimensional shrinkage thresholding operators (e.g. a group lasso can zero entire groups, but a group itself isn't necessarily sparse).

However, also this is not directly relevant to us. A group lasso uses predefined groups. In our case group labels are not available. Moreover, there are no locality preserving features. So, at one hand, it uses too much information, on the other hand it uses too little. We need to look closer into this locality aspect than into the sparsity aspect.

mrquincle commented 4 years ago

In Locality-constrained Group Sparse Representation for Robust Face Recognition locality is introduced through a distance between a vector x and all elements in the codebook A. By incorporating this distance the optimization program makes sure that if x and y are "close", then their representations are also close in "representation space". The authors contribution is to add the group lasso to this formulation.

This actually defines locality, but its formulation is quite "global". The distance metric concerns all other elements in the codebook (in our case, the other point cloud). In contrast, we want to preserve local distances, but we don't care about global distances. Or even more precise, we want to preserve local distances with other points in our group when we map it to the target group. We don't care about other source groups (or other target groups). Also in this paper, groups are predefined.

mrquincle commented 4 years ago

In Multi-marginal Optimal Partial Transport Problem a partial barycenter problem is formulated. The barycenter formulation corresponds with finding a target distribution that given a cost function minimizes the latter. The partial part of it, is that not all mass needs to be transported towards the target distribution. Moreover, for each input measure this can be different. Read the paper itself for the analogy on mines and factories.

In Discrete Wasserstein Barycenters: Optimal Transport for Discrete Data the example is very neat. It is about the distribution of goods from a bunch of distribution centers where the demand is different each month (e.g. think about bottled water and latitude depending on season). The authors describe a discrete distribution over nine cities over N months.

Although interesting reading material, barycenter literature does not bring us much further w.r.t. locality and/or sparsity. The group structure at the "source" is predefined (even if the barycenter itself wouldn't be).

mrquincle commented 4 years ago

I still can't find a paper where they try to make a trade-off between large geometric displacements and local ones as shown by the EMD mean-shift algorithm I propose. See figure: source distribution is in the middle, target distribution at the corners.

EMD mean-shift

There should be literature out there! It shouldn't be so far-fetched to search for a transportation plan in which we incorporate translations of subsets of the distribution and penalize this differently from the local restructuring of those subsets.

This must also be common to problems where there are repacking or reassembling processes, but where these can be done at different locations.

mrquincle commented 4 years ago

This is actually a search problem in a latent space of translations where each point can be translated, but where we search for a matrix with many identical values. In other words, we regularize through punishing diversity in the transportation map.

mrquincle commented 4 years ago

Bingo! In Statistical Optimal Transport via Factored Couplings the entity optimized for is the so-called transport rank. This figure is similar to the nonnegative rank of a matrix. The transport rank is a sum of k product distributions with k as small as possible. It is called a factored coupling by the authors. Those couplings induce partitions at the source and target distributions. Those partitions are soft (points can belong to multiple partitions). To solve this problem a k-Wasserstein barycenter is used. Compare their results in Figure 1 especially at the right where the coupling is rounded to a map. I'll have to look much deeper into this paper!