hyperdimensional-computing / torchhd

Torchhd is a Python library for Hyperdimensional Computing and Vector Symbolic Architectures
https://torchhd.readthedocs.io
MIT License
246 stars 25 forks source link

Is it possible to "OR" two vectors? #171

Closed uler3161 closed 1 month ago

uler3161 commented 3 months ago

I'm fairly new to HDC and trying to wrap my head around how it works. There's a really good example out there that shows binding with multiplication (SHAPE is CIRCLE, COLOR is RED) and bundling with addition (SHAPE is CIRCLE and COLOR is RED).

What I'm missing is how to represent something like "SHAPE is CIRCLE or COLOR is RED" or even something like "SHAPE is CIRCLE or SHAPE is SQUARE". And for that matter, I'm wondering about "SHAPE is not CIRCLE".

Are these kind of things possible with HDC? If so, are there torchhd examples of it?

mikeheddes commented 3 months ago

Hi, thank you for posting this interesting question. I am not aware of a specific "or" operator in HD/VSA but perhaps the following will satisfy. When we say SHAPE is CIRCLE, what we often mean is MEMORY(OBJECT * SHAPE) -> CIRCLE, that is, given an OBJECT we can infer its SHAPE (assuming this information is part of the OBJECT) by binding with SHAPE and then retrieving the most similar item from an auto-associative memory, like a Hopfield network (assuming MEMORY contains all the relevant concepts).

For a SHAPE to be either CIRCLE or SQUARE, we could interpret this as the memory returning either CIRCLE or SQUARE at random. This could be achieved by bundling CIRCLE and SQUARE as the SHAPE of the OBJECT. The answer of the memory lookup can then be "nudged" to either CIRCLE or SQUARE by adding some noise. To summarize, what I have in mind is the following: OBJECT = SHAPE (CIRCLE + SQUARE) and then MEMORY(OBJECT SHAPE + NOISE) -> CIRCLE or SQUARE. I haven't tested this so I'm not sure how well this would work but I hope this is helpful.

There might also be others in the HD/VSA community that have a better idea on how to answer this (and I encourage them to respond here as well).

rgayler commented 3 months ago

I'm` fairly new to HDC and trying to wrap my head around how it works

In my opinion that's the hard part. The underlying mechanics is pretty simple; working out how to use it, not so much.

VSA/HDC can be used to (approximately) represent pretty much anything. But then you have to set up a process that transforms those representations in a way that corresponds to some computation you want to carry out (which may be trivial, hard, or impossible). So any specific choice about how to represent something in VSA will have been made with some specific way of processing that representation in mind. However, the community (including me) doesn't generally make that explicit - they tend to concentrate on the representation and assume entailed processing is self-evident.

how to represent something like "SHAPE is CIRCLE or COLOR is RED"

I think the source of your problem here might be that you are interpreting this as a statement in logic (from which, for example, you could deduce that if COLOR is BLUE then SHAPE must be CIRCLE). If you wanted to use that interpretation you would have to represent those statements with VSA representations of trees (to capture the syntax of the statements) and figure out some truth-preserving mechanism to process those representations (e.g. something like LT-Frame). Doing that thoroughly probably takes you away from the spirit of VSA.

One alternative interpretation might be to say:

  1. With VSA we can represent (possibly composite) things as vectors.
  2. Every vector has a magnitude
  3. When we're given an arbitrary vector we can project it onto a known vector to get the magnitude of the projection of the arbitrary vector onto the known vector.
  4. If the arbitrary vector represents what the system "believes" we can project it onto a number of known vectors.
  5. The magnitudes of the projections of the arbitrary vector onto the known vectors shows the extent to which the system "believes" that each of those known vectors is present in the state of the system.
  6. You can interpret those magnitudes as something like a graded truth value.

So then you could represent SHAPE is CIRCLE or COLOR is RED as SHAPECIRCLE + COLORRED in the state vector. This would represent that both those facts are simultaneously true (i.e. not taking "or" as the logical connective). Then you could probe the state vector with SHAPECIRCLE or COLORRED (i.e. calculate the dot product) and both those probes would evaluate to true (which also means that probe1 OR probe2 would evaluate to true in the strictly logical sense).

Of course, that suggestion may not be what you want for your particular problem. So, again, the hard part is working exactly what behaviour you want then determining the representation and processing that jointly implement the desired behaviour.

Getting back to your original question "Is it possible to "OR" two vectors?" that depends on precisely what you mean by OR. The answer is almost certainly Yes, but what you have to do to make it so will vary greatly depending on your specification of OR.

uler3161 commented 3 months ago

Thank you both for all the info.

The specific problem I have in mind is trying to encode a truth table for a boolean SAT problem and then somehow query to find which rows of the table evaluate to true.

For instance A XOR B. One encoding I've thought of using is: A*True + B*False and A*False + B*True and OR those together. And then for the memory I'd use those as well as A*True + B*True and A*False + B*False.

Another encoding I thought of using is to encode the output column of the truth table using permutation. For example:

A B Output
False False False
False True True
True False True
True True False

Which might be encoded as permute(permute(permute(False) + True) + True) + False. Memory would then be: permute(permute(permute(False) + False) + False) + True permute(permute(permute(False) + False) + True) + False permute(permute(permute(False) + True) + False) + False permute(permute(permute(True) + False) + False) + False

The bigger picture is encoding a larger tree. Leaf nodes represent variables or their negations, so the are generating an encoding that represents either the rows where the variable is true or the rows where the variable is false. All non-leaf nodes will have two children. The children each return a vector encoding their truth table and the parent needs to generate a new truth table that is either an OR-ing of the child truth tables or and AND-ing of the child truth tables depending upon what kind of node it is.

I've tried playing with both encodings but it doesn't seem super reliable. Better than flipping a coin though.

uler3161 commented 3 months ago

I went in a little different direction than in my last post. Had some luck with vector magnitude. Thanks @rgayler for that suggestion.

Found something I think is very relevant to what I'm trying to accomplish at https://redwood.berkeley.edu/wp-content/uploads/2022/05/kanerva2022hdmss.pdf. Specifically the section on sets and multisets. Addition seems to be a big part of the solution. However, as mentioned in the paper, normalization seems to be of issue.

Specifically for me, I'm doing a massive amount of additions. I'm currently testing with torch.double dtype and converting my MAP seed tensors to be {-torch.finfo(torch.double).tiny, torch.finfo(torch.double).tiny} rather than {-1, 1}. Still end up with -inf/inf values at some point. Unfortunately it doesn't look like they are going to add float128 to pytorch anytime soon.

I'm contemplating one of the following:

  1. Moving over to using numpy and the long double though I'm worried about performance.
  2. Some kind of normalization that won't mess things up too bad, though I'm not sure what I can do there.
  3. Find a way to smartly choose seed vector values, or more specifically their signs so that addition doesn't grow towards inf or -inf as fast.
mikeheddes commented 3 months ago

I am curious to know more specifics about what you are doing that you are facing issues using doubles?

Note that if you try to sum many small values (like torch.finfo(torch.double).tiny = 2.2250738585072014e-308) you will get numerical problems as you seem to be describing. It is generally better to sum less extreme values. If, for example, the vector magnitudes are all integers the summation will not have any numerical issues until some very large numbers. In this case you could also use torch.long representations for the vectors.

After adding/bundling all the vectors you can use hard_quantize to normalize the resulting vector. Alternatively, you could consider using the clipping function periodically during the additions to ensure the values stay within a reasonable range.

uler3161 commented 3 months ago

I'm doing a massive amount of additions for the particular problem I was trying to solve. I've tried some smaller "toy" problems and things seem to work ok. However, I did just get an implementation of numpy done with longdouble and I'm not getting great results :-/. So maybe that massive amount of additions is just turning everything to noise.

rgayler commented 3 months ago

Disclaimers first:

Had some luck with vector magnitude. ... I'm doing a massive amount of additions. ... Still end up with -inf/inf values at some point.

OK, I did say "[y]ou can interpret those magnitudes as something like a graded truth value" but I think the fact that you're getting infinite values can be taken as reasonably conclusive evidence that using magnitudes in the way you're doing the calculations is not a good idea.

I have come to believe that unitary phasors [(e.g. , ) are the canonical form for VSAs. In a unitary phasor hypervector each element is a complex value with modulus 1. This is even more constrained than normalisation of the magnitude of the hypervector and the information content of the hypervector is conveyed purely by it's direction (encoded in the phase of all the elements). This is very strict about normalisation because rather than occasionally throwing in a normalisation as a free-standing operator the output of every operator is required to be normalised. Note that even if the hypervector is strictly normalised, if you decompose it as a sum of terms the term vectors will (mathematically) have smaller magnitudes, so it's possible for vector magnitudes to drive the dynamics of your system even if you never explicitly represent them.

I'm doing a massive amount of additions for the particular problem I was trying to solve

That's probably going to cause problems. Because randomly generated hypervectors are generally quasi-orthogonal rather than strictly orthogonal if you add enough of them together you mush from which you can't reliably detect the components. You can fight this by increasing the dimensionality of the hypervectors but you should take this as a big hint that you should look at whether there is an alternative approach to your problem that doesn't require summing so many terms. Given that Boolean SAT is NP Complete you might find that exorbitant scaling of the hypervector dimension is the VSA price you have to pay.

In the very early posts in this thread we were talking about representing logical propositions as trees and you had True and False as separate literals in your propositions. To me this carries connotations of a traditional symbolic logic approach. We might be able to do better by taking advantage of what comes for free with VSA:

Having True and False as separate randomly generated hypervectors in VSA means that their values are quasi-orthogonal and effectively independent. In classical logic we want True and False (as values of some proposition) to be mutually exclusive. This suggests we could represent False as -True so that they're mutually exclusive values. So we would generate some random hypervector True and represent False as -True.

If we're representing variables having truth values we could do that by binding a representation of the identity of the variable (say, A) with it's truth value (say, True): ATrue, and A being false as A(-True), which by is equivalent to -(ATrue). Note that ATrue is just another random hypervector, so if we are only ever interested in the truth value of variables and are not going to unbind A*True to retrieve A or True then we can just throw away the True and use A and -A to represent the variable A having the truth values True and False respectively.

If you set up some recurrent system that accumulated evidence for the desired truth values of each of the variables you could add evidence for A (and -A) from different sources of evidence and that would sum to indicate the supported value of A in the solution. Simon Levy and I did something like this in https://www.researchgate.net/publication/215990020_A_distributed_basis_for_analogical_mapping We set up a recurrent VSA circuit with a hypervector containing the weighted sum of terms, each representing a possible mapping of the vertex in one graph to a vertex in another graph. This potential solution vector was used to generates updates to itself form many items of evidence (each of which encoded something like: to the extent that you currently believe that P maps to X, you should increase your belief that Q maps to Y). That worked because there was some maths behind it justifying the approach, but the solution was essentially heuristic - the maths was good at coming up with a reasonably good answer quickly, but wasn't guaranteed to come up with the best answer. To apply that approach to Boolean SAT it would help to have some pre-existing maths that mapped the problem to a simple dynamic system. You would also need to be happy that a heuristic answer was OK for your use case. If you need an exact answer you're probably not going to be able to use VSA.

encode a truth table for a boolean SAT problem and then somehow query to find which rows of the table evaluate to true.

You were looking at representing these as full tree structures, but that may not be necessary to solve your problem. For example, f you want to know whether some row is true you don't need to represent the structure of the proposition, you can just rely on the fact that a binding can be treated as a representation of a specif point to point (domain to co-domain) of a function,. If you have a binding AB you can treat it as representing the point mappings A -> B and B -> A. So you can query AB by unbinding with A to get the value B. For your case you need to have some way to represent the proposition as bound variables like propositionTrue, then you would query it (unbind) with proposition to get True. If youu wanted to get the ID of the true propositions you would represent it something like propositionID. But for your purposes you may not actually need the ID. It might be better to return the variable values that make the proposition true, e.g. proposition(ATrue + BFalse + XTrue).

The representation of the proposition I am suggesting above is incompatible with the negating truth representation I suggested earlier because of negation being associative. That is, when you bind terms you lose track of which terms the negation is associated with: A(-B) = (-A)B = -(AB). One possibility would be to represent truth values as separate hypervectors variable instantiation with permutation. permutations are not restricted to cyclic shifts, you can use any random permutation: pA() to represent binding a truth value to variable A. So A=True AND B=False would be represented as pA(True)pB(False). You would need some separate mechanism to encode that True and False values are mutually exclusive.

Also, you will probably need a cleanup operation somewhere to suppress all the extra noise terms generated by binding and unbinding.

I hope that helps.

uler3161 commented 3 months ago

Wow, lots of information to digest. Thank you. I am a software engineer, so the programming side isn't too tough, but I do find myself on the opposite side of understanding the theory of everything. But hopefully I'll be able to understand everything.

rgayler commented 3 months ago

@uler3161 I suspect what you're trying to do could be turned into a reasonable PhD project.

Having said that, I think VSA would be simple enough that it could be taught at high school (if we can find the right way to teach it).

I think the problem is that VSA requires a different way (from standard programming) of approaching tasks so as to engage the useful properties that VSA offers. My standard metaphor is to think of asking a Fortran programmer to write a program in Prolog. It's not that it's necessarily more complex, rather that you have to completely change your way of thinking about programming.

In case you don't already know, there are plenty of VSA talk videos at https://sites.google.com/view/hdvsaonline/home and links to other resources at https://www.hd-computing.com/

mikeheddes commented 1 month ago

I am closing this now due to inactivity. Feel free to open this issue again if needed.