The distance module was once placed at the top level, and was used primarily as a quick alias to the scipy functions that are the actual spatial distance functions that we tend to use most frequently after generating our embeddings.
Re-implementing the documentation for these functions really had no value, and how we used these distance functions was really the valuable bit - yet wasn't included as general functions. We changed that with this refactor, adding two main functions:
vector_distance, which takes in 2 vectors and either a string or distance function and executes a distance calculation. The primary use case for this is in the string context, mostly such that our distance functions can be toggled based on a configuration value vs. a code change. It also allows us to better support mahalanobis from a usage perspective, in that the actual comparison happens only on the vector to vector level, but it does require some initial setup in the manner of an inverse covariance matrix representing a set or representative sample of the full set of vertices' vectors. A curried mahalanobis function returns a Callable that only takes in two vectors, but uses the initial inverse covariance matrix provided at the first call to provide that to scipy, thus meeting our vector to vector calculation requirement for the vector_distance function. Also, magic strings can be error prone, so you can use the actual function itself and have an IDE or mypy or other linter catch a problem with the spelling of euclidean (for instance) if you use the functions instead of the strings (not useful in configuration, but useful for people who AREN'T doing configuration based distance calculations)
embedding_distances_for - this function takes in the vector we're comparing against all other vectors in the embedding. The embedding can be either an EmbeddingContainer or an np.ndarray, and will return a corresponding 1d np.ndarray of distances (with the same distance method parameter and behavior as described for vector_distance). In most circumstances this will also include a distance to itself.
Closes #5
The distance module was once placed at the top level, and was used primarily as a quick alias to the scipy functions that are the actual spatial distance functions that we tend to use most frequently after generating our embeddings.
Re-implementing the documentation for these functions really had no value, and how we used these distance functions was really the valuable bit - yet wasn't included as general functions. We changed that with this refactor, adding two main functions: