hyperdimensional-computing / torchhd

Torchhd is a Python library for Hyperdimensional Computing and Vector Symbolic Architectures
https://torchhd.readthedocs.io
MIT License
229 stars 24 forks source link

Design to allow multiple versions of primitive VSA operators #73

Closed rgayler closed 1 year ago

rgayler commented 2 years ago

Allow for multiple, different versions of the primitive VSA operators (bind, bundle, permute, and maybe others).

This is needed, for example to introduce HRR binding, and randsel bundling (#75).

This is related to #72 and #25, which appear to imply dispatching on hypervector type. However, people will want to introduce and compare different operators applied to the same hypervectors, so this issue is orthogonal to hypervector type.

mikeheddes commented 2 years ago

How about an API such as this (not an exhaustive list but just an illustration):

import torchhd

# binding operations
torchhd.multiply(a, b)  # works for all data types (implements XOR for boolean tensors)
torchhd.convolution(a, b)
...

# bundling operations
torchhd.sum(a, b) 
torchhd.mean(a, b) # normalizes the result of sum
torchhd.randsel(a, b)
...

With this design our current torchhd.bind and torchhd.bundle will simply be replaced with torchhd.multiply and torchhd.sum.

We should add sections in the documentation to group the different binding and bundling implementations so it's clear for people what they are meant to do.

rgayler commented 2 years ago

Sorry - I am so totally not a pythonista that I can't tell what that does.

If it's any help, my motivation for the request is:

  1. The space of VSA primitive operations is broadly indexed by hypervector type and operator algorithms (which are roughly orthogonal).
  2. Some VSA researchers will want to compare the same abstract VSA circuit across differing definitions of VSA primitives.
  3. Researchers doing that would generally want consistent VSA primitive definitions throught the abstract VSA circuit (i.e. not using multiple different implementation in the one VSA circuit).
  4. Ideally, they would define the abstract VSA circuit once and then be able to specifiy the different VSA primitive definitions without rewriting the circuit code (thus avoiding a possible source of error).

This is also the motivation for #79. If the encoding functions are defined in terms of abstract bind and bundle then the impact of different instantiations of bind and bundle can be investigated.

mikeheddes commented 1 year ago

In my comment above I tried to present a sketch of the functions that Torchhd could expose/provide to allow users to switch between multiple methods for the primitive operations. So besides the current default (sum for bundling and product for binding) we would also expose circular convolution for binding and randsel for bundling, among many other possible methods.

I think to make this design very ergonomic for the user we should still expose generic bind, bundle and permute functions but allow users to override them. For this API design I was thinking we can take inspiration from multiprocessing.set_start_method.

My idea is a design as follows:

import torchhd

A, B = torchhd.random_hv(2, 10000)

torchhd.bundle(A, B)  # addition by default
torchhd.bind(A, B)  # product by default

# here comes the new part
torchhd.set_bundle_method(torchhd.randsel)
# torchhd.set_bind_method(...)
# torchhd.set_permute_method(...)
# from now on any call to torchhd.bundle will execute randsel
torchhd.bundle(A, B)
# you could equivalently do
torchhd.randsel(A, B)

A benefit of this design is that our data structures such as hash tables and graphs internally use the bundle and bind functions which means that their behavior will automatically update when torchhd.set_bundle_method is called. Another benefit is that users could specify their own bundling function which takes two hypervector tensors and produces a third and can set torchhd to use this in all the code, like so:

import torchhd

def my_custom_bundle(a, b):
    return a + b

torchhd.set_bundle_method(my_custom_bundle)

A, B = torchhd.random_hv(2, 10000)
torchhd.bundle(A, B)  # uses my_custom_bundle

I think this could be very useful for research purposes.

By default the code as users write it with the current version stays the same but more flexibility becomes available because the primitives can be changed.

@rgayler could you perhaps help me compile or point to an overview of all the various versions of bind, bundle and permute that we should implement? I am looking for general descriptions of the methods such that we can implement them for the broadest range of hypervector types. For example, multiply binding is equivalent to XOR for binary hypervectors so I would like to keep these as one function whose execution depends on the hypervector type, instead of providing both torchhd.multiply and torchhd.xor.

mikeheddes commented 1 year ago

One potential problem I thought of it that currently we also provide multibundle and multibind which are more efficient implementations when bundling or binding many hypervectors. Although they are mathematically equivalent to a for loop over bundle/bind the speed up of performing this in a single call can be pretty significant.

One way to combat this is to allow the user to override both the single and multi version of each primitive. When a user only provides the implementation for a single we can fallback to a for loop in the multi-methods implementation. And for the build-in methods the user could specify them as a string so that we can set the single and multi versions correctly behind the scene. Here are some examples that will hopefully make it more clear:

import torchhd, torch

# assuming we have randsel implemented in the library 
torchhd.set_bundle_method("randsel")

hypervectors = torchhd.random_hv(2, 10000)
A, B = hypervectors

torchhd.bundle(A, B)  # randsel 
torchhd.functional.multibundle(hypervectors)  # efficient multi-randsel

def my_custom_bundle(a, b):
    return torch.add(a, b)

# user only specifies custom single bundle method
torchhd.set_bundle_method(my_custom_bundle)
torchhd.bundle(A, B)  # uses  torch.add
torchhd.functional.multibundle(hypervectors)  # will use a for-loop with torch.add

def my_custom_multibundle(a):
    return torch.sum(a, dim=-2)

# user specifies both a custom single and multi bundle method
torchhd.set_bundle_method(my_custom_bundle, my_custom_multibundle)
torchhd.bundle(A, B)  # uses  torch.add
torchhd.functional.multibundle(hypervectors)  # uses torch.sum
mikeheddes commented 1 year ago

I'm working in the feat-primitive-override branch and coded the basic functionality to be able to override the bind, multibind, bundle, multibundle, and permute methods. I still have to make it more robust, figure out how inverse bind fits into this design and update documentation and tests.

rgayler commented 1 year ago

could you perhaps help me compile or point to an overview of all the various versions of bind, bundle and permute that we should implement? I am looking for general descriptions of the methods such that we can implement them for the broadest range of hypervector types. For example, multiply binding is equivalent to XOR for binary hypervectors so I would like to keep these as one function whose execution depends on the hypervector type

@mikeheddes there are a few survey papers around (I'll get pointers to the ones I know) that list multiple instantiations of the generic operators. From memory, these don't make a point of highlighting which instantiations are special cases of others. FWIW binding with XOR and bipolar multiplicative binding can both be seen as equivalent to complex multiplicative binding where pahse angle has been quantised to two levels.

I understand why it's attractive to minimise the number of function names and dispatch on argument type, but if it turns out that's too hard I don't think it would be a tragedy to have a bunch of related instantioations of primitive operators and just note the relationships in the documentation.

rgayler commented 1 year ago

These are the obvious recent surveys: https://doi.org/10.1145/3538531 https://doi.org/10.1007/s10462-021-10110-3 But you already know about them because you've cited them in the TorchHD paper.

mikeheddes commented 1 year ago

@rgayler thank you for the papers, I wanted to make sure there are no important ones that we were not aware of yet. I will try to make a table where I group special cases of methods. Then we can refine that before we implement them.

mikeheddes commented 1 year ago

I can foresee that once we have a broader range of VSA/HDC models implemented we could add a method like set_model_primitives("MAP") which would internally call set_bundle_method, set_bind_method etc.