hyperdimensional-computing / torchhd

Torchhd is a Python library for Hyperdimensional Computing and Vector Symbolic Architectures
https://torchhd.readthedocs.io
MIT License
233 stars 24 forks source link

Design for supporting different hypervector types #25

Closed mikeheddes closed 2 years ago

mikeheddes commented 2 years ago

We can use the same design PyTorch uses, we can extend the dtypes, and extend the different tensor types i.e. FloatTensor. Then in the HDC operations we can check the instance type of the hypervector to change the behavior.

import torchhd

hv = torchhd.functional.random_hv(10, 1000)  # bipolar by default (torch.float)
hv = torchhd.functional.random_hv(10, 1000, dtype=torch.bool)
hv = torchhd.functional.random_hv(10, 1000, dtype=torch.complex64)

torchhd.functional.bind(hv[0], hv[1])  # works for any datatype
mikeheddes commented 2 years ago

Useful functions to change the behavior of the hypervector operations:

To find out if a torch.dtype is a floating point data type, the property is_floating_point can be used, which returns True if the data type is a floating point data type.

To find out if a torch.dtype is a complex data type, the property is_complex can be used, which returns True if the data type is a complex data type.

rishikanthc commented 2 years ago

It would be good to support the following:

mikeheddes commented 2 years ago
rgayler commented 2 years ago

I am not very familiar yet with the HRR and FHRR representations

The representations are just real and complex, respectively. The unique points are that HRR binding/unbinding operators are circular convolution/correlation respectively and you transform between HRR and FHRR with a Fourier transform. Tony Plate's PhD thesis (1994) is your friend for this. IMO these are historically important, and people tend to use them for that reason, but I think people should use straight real or complex VSAs for practicality.

Real valued hypervectors should be the default representation

FWIW I have recently come to the view that complex-valued VSA (phasor representation) is the fundamental VSA as most of the other VSA types (HRR, BSC, MAP, ...) can be interpreted as special cases (most varying in terms of quantisation of phase angle and vector magnitude of the individual complex elements).

On the topic of complex-valued VSAs: If you are looking to implement a wide variety of different VSA types you should include unconstrained complex values. However, practical applications of complex-valued VSA typically constrain the element magnitude to be 1, so that only the phase angle varies. Recent introduces a threshold on the element magnitude . If the bundling result magnitude is above the threshold parameter (per element) the magnitude is normalised to 1, otherwise it is set to 0 (see Eqn 3, http://www.pnas.org/lookup/doi/10.1073/pnas.1902653116).

On the topic of bit widths: The same thinking can be applied to the phase and magnitude of complex-valued VSAs. However, I think you need to decide whether the purpose of allowing choice of bit-width/resolution is to explore the impact of quantisation or for efficiency. If the former, it's probably easier to perform the calculations in full complex values then quantise the results to the desired resolution. If you're actually trying to maximise computational efficiency I suspect you would have to implement multiple special cases.

On the topic of random initialisation of hypervectors: I think the point here is sharpest when considering complex-valued elements. The distribution of phase values controls the shape of the similarity kernel, so is a major component of representation design (see Section 5.2 in http://arxiv.org/abs/2109.03429). Making it easy to specify some arbitrary distribution of values would be good, and maybe build-in some common choices of distribution.

mikeheddes commented 2 years ago

With #81 I believe the most important hypervector types are now provided therefore I will close this issue. If there is a need to support types that are currently missing feel free to open a new issue specific to that representation such that we can have more focused conversations in each issue.