wmayner / pyemd

Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric
MIT License
479 stars 62 forks source link

What is the distance matrix? #19

Closed fsfeng closed 7 years ago

fsfeng commented 7 years ago

Hi, not necessarily an issue, what precisely is the distance matrix required? Thanks.

wmayner commented 7 years ago

The distance_matrix parameter encodes the metric that underlies the Earth Mover's Distance. The EMD between two distributions is the minimal cost of transforming one distribution of “earth” into the other, where cost means (amount of “earth”) * (distance it needs to be transported). The second term in that product is given by the underlying metric.

So, the distance matrix can be whatever you want as long as it encodes a metric (it must be symmetric and satisfy the triangle inequality). Note that the software will not warn you if this is not true.