hyperdimensional-computing / torchhd

Torchhd is a Python library for Hyperdimensional Computing and Vector Symbolic Architectures
https://torchhd.readthedocs.io
MIT License
221 stars 23 forks source link

Implement Sparse Block Codes VSA model #146

Closed mikeheddes closed 12 months ago

mikeheddes commented 1 year ago

Implementation of a VSA model based on High-dimensional computing with sparse vectors.

Checklist

mikeheddes commented 1 year ago

@denkle I assigned you as a reviewer for this PR, if you have time could you skim through the code to see if it makes sense. I have not used this VSA model before so I am not as confident as with the current models.

We can also think about the name of the model. I went for Binary Sparse Vector but I have also seen other names.

Update: I have changed the name to Sparse Block Codes as per your offline suggestion.

denkle commented 1 year ago

@mikeheddes, thanks for taking care of the implementation! I have looked at the code (have not tested it though). At this stage, I have a conceptual suggestion. While I like that everything is done at the level of active components, I am not sure how general is it for the case of SBC (elaborate below). Nevertheless, I think this code with some modifications could be an excellent starting point for including "Modular Composite Representation" (Cognitive Computation, 2014) because that model is literally meant to operate on integers with the caveats (which would probably be invisible for the end user) how bundling and similarity calculations are done. As per SBC, the main caveat is the bundling operation because I am not sure it always has to be the case that after the bundling the activity within the block is still one-hot. In this more generic case, the whole hypervector would need to be represented explicitly with dimensions*block_size components and the binding operation would then be block-wise circular convolution (modulo sum is a special case when both blocks to be bound are one-hot). "Variable Binding for Sparse Distributed Representations: Theory and Applications" touches on this case. In anyway, having to operate with complete dimensions*block_size hypervectors will not be efficient. So may be a compromise would be to have one-hot only model that could be called more specifically like it was done in "A Comparison of Vector Symbolic Architectures" for MAP: MAP-I, MAP-B, etc.

mikeheddes commented 1 year ago

@denkle those are some good points. I agree that it would be better if we also provided the unconstrained version of this model. I think it could still be pretty efficient if we use Torch's sparse vectors instead. It would only not be efficient if you first bundled many vectors and then bind. At some point the dense vector is going to be more efficient at performing the circular convolution with the FFT. This does add an extra layer of complexity but it might work well.

We could name the current version Binary Sparse Block Codes (BSBC)? This would be similar to how BSC is the binary version of MAP.

I created a new issue for the implementation of Modular Composite Representation.

denkle commented 1 year ago

I created a new issue for the implementation of Modular Composite Representation.

Sounds good!

denkle commented 1 year ago

@denkle those are some good points. I agree that it would be better if we also provided the unconstrained version of this model. I think it could still be pretty efficient if we use Torch's sparse vectors instead. It would only not be efficient if you first bundled many vectors and then bind. At some point the dense vector is going to be more efficient at performing the circular convolution with the FFT. This does add an extra layer of complexity but it might work well.

We could name the current version Binary Sparse Block Codes (BSBC)? This would be similar to how BSC is the binary version of MAP.

I agree that it would be an extra layer of complexity if the implementation would need to dynamically decide whether to use sparse or dense variants.

Naming-wise, BSBC should be okay. There would be some discrepancy to Sparse Binary DR (suggesting SBBC) but it seems like it would be good to have SBC acronym intact so B-SBC or SBC-B.