src-d / seriate

Optimal ordering of elements in a set given their distance matrix.
Other
16 stars 11 forks source link
python seriation

seriate

Optimal ordering of elements in a set given their distance matrix.

Travis build status Code coverage PyPi package status stability: stable Apache 2.0 license

example

OverviewHow To UseContributionsLicense

Overview

This is a Python implementation of Seriation algorithm. Seriation is an approach for ordering elements in a set so that the sum of the sequential pairwise distances is minimal. We state this task as a Travelling Salesman Problem (TSP) and leverage the powerful Google's or-tools to do heavy-lifting. Since TSP is NP-hard, it is not possible to calculate the precise solution for a big number of elements. However, the or-tools' heuristics work very well in practice, and they are used in e.g. Google Maps.

Any numpy.roll-ed result is equivalent.

How To Use

import numpy
from scipy.spatial.distance import pdist
from seriate import seriate

elements = numpy.array([
    [3, 3, 3],
    [5, 5, 5],
    [4, 4, 4],
    [2, 2, 2],
    [1, 1, 1]
])

print(seriate(pdist(elements)))

# Output: [4, 3, 0, 2, 1]

The example above shows how we order 5 elements: [3, 3, 3], [5, 5, 5], [4, 4, 4], [2, 2, 2] and [1, 1, 1]. The result is expected:

  1. [1, 1, 1]
  2. [2, 2, 2]
  3. [3, 3, 3]
  4. [4, 4, 4]
  5. [5, 5, 5]

pdist from scipy.spatial.distance uses Euclidean (L2) dstance metric by default, so the distance between [x, x, x] and [x + 1, x + 1, x + 1] is constant: √3. Any other distance is bigger, so the optimal ordering is to list our elements in the increasing norm order.

Contributions

Contributions are very welcome and desired! Please follow the code of conduct and read the contribution guidelines.

License

Apache-2.0, see LICENSE.md.