Random Projection Encoding

This is a cool project and thanks for referencing our paper for random-projection encoding methods! Just FYI - the random projection encoding method you have implemented is a bit different from those discussed in the paper referenced. Let x be an $n$-dimensional input point to encode. I think the particular embedding you are referring to is the following:

z = sign(Mx)

where M is a d x n dimensional matrix whose rows are sampled from the uniform distribution over the unit-sphere. A simple way to generate samples from the uniform distribution over the unit-sphere is to normalize a sample from the $n$-dimensional standard normal distribution (see here for more info). Note that I'm not sure sampling from the uniform distribution over [-1,1]^n and normalizing results in a uniform distribution over the sphere. In particular, I think this approach results in too little mass distributed near the equator and poles. The following code should do the trick:

# Generate the embedding matrix:
import numpy as np
d = 10000; n = 100
M = np.random.normal(size=(d,n))
M /= np.linalg.norm(M, axis=1).reshape(-1,1)

# encode a point:
x = np.random.rand(n)
z = np.sign(M.dot(x))

The sign function is important because the sense in which the encoding preserves distances between points is different without it (and I'm not sure is what one would want). You may not want to use the sign function because it messes with gradients (e.g. its derivative is zero everywhere except at zero, where it does not exist). If you want to omit the sign function and use a linear projection, I would recommend looking into the "Johnson-Lindenstrauss Transform" (see here).

hyperdimensional-computing / torchhd

Random Projection Encoding #87