Implement randsel bundling

Random selection (randsel) bundling is an alternate implementation of the bundling operator that has the advantages of being conceptually simple and working on every possible hypervector type (see https://rgayler.github.io/VSA_altitude_hold/vsa_basic_operators.html#6_Add_vectors).

It works by elementwise random selection of elements from the operand hypervectors. That is, given operand hypervectors A and B with elements A_i and B_i, each element C_i of the result hypervector C is a copy of A_i or C_i selected at random with equal probability. It delivers a mixture of the operands while preserving the index information.

Viewing bundle as an abstract data type we define bundle in terms of the similarity relationships of the result to the operands - we want the result of bundling to be similar to the operands. All the usual similarity functions over pairs of hypervectors are effectively averages over the element indices of elementwise similarity. So for the result to be similar to each of the operands it should include some elements from each of the operands. Because it is just selecting elements at random and not operating on their values it doesn't matter what type the operands have (boolean, integer, real, complex, etc.). The operands and results are all the same type and dimension.

Binary Spatter Code (boolean) hypervectors with majority-rule bundling for two operands is equivalent to randsel bundling. If both operand are the same value (TRUE or FALSE) the result is the same value. This is identical to randomly sampling one of the two operand values. If the operands are different values (TRUE and FALSE) in BSC the tie is broken at random. This is identical to randomly sampling one of the two operand values.

randsel bundling can be generalised to more than two operands (like BSC bundling) and scalar weighting of the operands. Given k-many operands, just select between the operands operands at random with equal probability. To do scalar weighted bundling (i.e multiset encoding) there is a finite positive scalar weight associated with each operand, and the elements of each operand are selected with probability proportional to the weight. (Note that the sum of the probabilities across operands sums to one, so if you are thinking of this as a multiset then you need to be aware that the element weights are effectively normalised.)

Note that for more than 2 arguments randsel bundling is not equivalent to BSC bundling because (in the absence of ties) the element result value is the majority element operand value, whereas for randsel bundling (with equal weights) the operand element values appear as result element values with probability reflecting their occurrence across the operands.

Important: The random selection should be random per application of the operator, not a fixed random selection determined at implementation time. A lot of VSA circuits are effectively feed-forward circuits, for these circuits per-operation randomisation is equivalent to at-initialisation randomisation on a single pass through the circuit. However, at-initialisation randomisation does induce correlations between trials, so makes a (slight) difference to statistics calculated over trials. With recurrent VSA circuits the difference between per-operation and at-initialisation is important because the values are passed repeatedly through the same operations. Per-operation randomisation performs the operand mixing without imposing extra structure, whereas at-initialisation randomisation is more like dividing the circuit into multiple parallel fixed circuits without bundling.

(Per-operation randomisation is probably quite expensive in a conventional digital computer and I don't know whether it upsets auto-differentiation in torch, but it's a useful theoretical baseline and might be relatively cheap in unconventional hardware.)

hyperdimensional-computing / torchhd

Implement randsel bundling #75