Input is a matrix of observations, with each row corresponding to an observation and each column corresponding to a dimension of data.
Data is assumed to be transformed into the proper coordinates by a previous process.
Output is matrix of all the pairwise distances between the observations.
The distance matrix should be able to produce:
L1 distance
L2 distance
Wasserstein distance, and/or metric based on KL divergence (e.g., for treating the samples from sound profile curve as discrete distributions and comparing distances that way).
It is important that these functions be fast, since we will spend most of our cycles on this and ball collision detection based on these distances when constructing the Vietoris-Rips or Cech complex for the data. As such, I propose that we use a fast library such as ArrayFire (which has Python bindings) or numba for the distance matrix computation.
As an addition to this, there should be a function that takes as input a distance matrix and a given radius and returns the Boolean matrix of pairwise collisions all the balls of radius r.
Generic functions in which:
The distance matrix should be able to produce:
It is important that these functions be fast, since we will spend most of our cycles on this and ball collision detection based on these distances when constructing the Vietoris-Rips or Cech complex for the data. As such, I propose that we use a fast library such as ArrayFire (which has Python bindings) or numba for the distance matrix computation.
As an addition to this, there should be a function that takes as input a distance matrix and a given radius and returns the Boolean matrix of pairwise collisions all the balls of radius r.