hyperdimensional-computing / torchhd

Torchhd is a Python library for Hyperdimensional Computing and Vector Symbolic Architectures
https://torchhd.readthedocs.io
MIT License
229 stars 24 forks source link

Similarity function that works for any supported hypervector type #72

Closed mikeheddes closed 2 years ago

mikeheddes commented 2 years ago

Currently we have 3 different similarity functions:

  1. hamming_similarity
  2. cosine_similarity
  3. dot_similarity

And with the future introduction of complex hypervectors we will likely add a forth one if we follow the current design. I think, however, that we should only provide one similarity function that changes it's behavior based on the dtype of the input tensors. It would also be nice if it handles batched operations, i.e., with input shapes (*, d) and (n, d) the output shape should be (*, n) which has the similarity score for each input sample against each other element.

In order to unify the output domain we can stick to the [-1, +1] range that the cosine similarity and the complex variant of cosine similarity produce where 0 means orthogonal, +1 the same, and -1 the exact opposite. We can simply scale the hamming_similarity to fall in this domain.

The dot_similarity will then be removed from the library but is still available as part of PyTorch. And can therefore still be used in specific instances.

API design


x = torchhd.random_hv(10, 10000)
torchhd.functional.similarity(x, x)  # aliased as torchhd.similarity(x, x)
rgayler commented 2 years ago

I recommend having two similarity functions simlarity.cos and similarity.dot that work on all hypervector types.

Also, although both values are in the range [-1, +1] for normalised input vectors, you should allow for the possibility of non-normalised input vectors, in which case the dot product can be arbitrarily large magnitude.

See #80