hyperdimensional-computing / torchhd

Torchhd is a Python library for Hyperdimensional Computing and Vector Symbolic Architectures
https://torchhd.readthedocs.io
MIT License
229 stars 24 forks source link

Random Projection Encoding #87

Closed thomas9t closed 1 year ago

thomas9t commented 1 year ago

This is a cool project and thanks for referencing our paper for random-projection encoding methods! Just FYI - the random projection encoding method you have implemented is a bit different from those discussed in the paper referenced. Let x be an $n$-dimensional input point to encode. I think the particular embedding you are referring to is the following:

z = sign(Mx)

where M is a d x n dimensional matrix whose rows are sampled from the uniform distribution over the unit-sphere. A simple way to generate samples from the uniform distribution over the unit-sphere is to normalize a sample from the $n$-dimensional standard normal distribution (see here for more info). Note that I'm not sure sampling from the uniform distribution over [-1,1]^n and normalizing results in a uniform distribution over the sphere. In particular, I think this approach results in too little mass distributed near the equator and poles. The following code should do the trick:

# Generate the embedding matrix:
import numpy as np
d = 10000; n = 100
M = np.random.normal(size=(d,n))
M /= np.linalg.norm(M, axis=1).reshape(-1,1)

# encode a point:
x = np.random.rand(n)
z = np.sign(M.dot(x))

The sign function is important because the sense in which the encoding preserves distances between points is different without it (and I'm not sure is what one would want). You may not want to use the sign function because it messes with gradients (e.g. its derivative is zero everywhere except at zero, where it does not exist). If you want to omit the sign function and use a linear projection, I would recommend looking into the "Johnson-Lindenstrauss Transform" (see here).

mikeheddes commented 1 year ago

Hi Thomas, thank you for your comment! Great catch. I think the main bug is that we sample from a uniform instead of normal distribution. I initially decided not to include the sign function because of the reason you mentioned but also because I thought it might allow the projection embedder to be used to make the encoder described in this paper. I now think the better approach is to provide both embedding classes.