htm-community / htm.core

Actively developed Hierarchical Temporal Memory (HTM) community fork (continuation) of NuPIC. Implementation for C++ and Python
http://numenta.org
GNU Affero General Public License v3.0
148 stars 74 forks source link

A Classifier that uses bit overlap #917

Open dkeeney opened 3 years ago

dkeeney commented 3 years ago

I have never really liked the SDRClassifier algorithm because it requires a lot of training before it can reliably classify (it uses a classical NN). I have noticed that some users did not provide enough training of the SDRClassifier and as a result assumed it did not work.

I have been playing with the idea of using bit overlap to match up an SDR pattern with patterns that a classifier has previously learned. For the learn( ) function, an SDR pattern to be learned would be added to a map along with its category if it was not already present. If a category was already present we OR the bits to the patterns learned for that cateory.

For the infer( ) function, the SDR pattern would be checked against the patterns already in the classifier map. The match probability would correspond to the number of matching bits. The brute force approach would be to call SDR's getOverlap( ) method for each SDR that has been learned by the classifier. That could be computationally expensive.

I noticed that matching bits converge quite quickly. For an SDR containing about 40 ON bits out of 2000, there was a decent probability of a match with as few as 3-4 bits. If we only checked those that had at least one bit matching it would reduce the search considerably. So, what if during learning, we stored our SDR in a multimap indexed by each bit and its category. Each learned pattern would have 40 entries if there were 40 bits turned on. Indexing by two bit combinations would converge faster but would require 40 X 40 entries in the map for each category which is probably too much space. So lets assume we index only by a single bit.

During infer we take each bit of the pattern to be matched and queried the multimap. For all those that also contained that bit and had not already been checked, we would call the getOverlap( ) function and save the overlap count in a result array. The result array could then be turned into a probability distribution.

As you all know, I am weak on algorithms but what do you think? Is this worth trying?

breznak commented 3 years ago

I have never really liked the SDRClassifier algorithm because it requires a lot of training before it can reliably classify (it uses a classical NN)

I agree with you, this was also my point - the classifier is not a HTM theory.

I noticed that matching bits converge quite quickly. For an SDR containing about 40 ON bits out of 2000, there was a decent probability of a match with as few as 3-4 bits

yes, this is how things should work, SDRs should be large (also the recently added assert to RDSE to force a decent size) and then the combinations are so unique that any match means "decent similarity".

what do you think? Is this worth trying?

definitely! I'd love to have this simpler & better (in terms of HTM) classifier. I've discussed this before with one of @fcr @marty1885 who already wrote such similar HTMClassifier - maybe we can use theirs code, or write it as you've suggested.

I'll have to reread your proposed implementation to understand it properly, but a general idea imho is:

fcr commented 3 years ago

The code https://github.com/htm-community/htm.core/blob/f6fdfaa0067636b41023d83a249a33b77804e031/py/htm/advanced/frameworks/location/location_network_creation.py#L649 uses SDR overlap successfully to classify objects.

dkeeney commented 3 years ago

I pushed my first cut at this in PR #918 Having a problem converting the overlapped bit count to a probability distribution. I could use some help there. The HTM School has a session about SDR's and discusses probability of a false positive in an SDR. He gives the formula there and perhaps I can work with that.

I would like this to be basically a drop-in replacement for the SDRClassifier algorithm in C++.

Thanks @fcr, for the link. I will take a look at that code as well.

dkeeney commented 3 years ago

I am still struggling with the PDF that should be returned as a result from infer( ).

The PDF generated by the softmax( ) function used by SDRClassifier should be a bell curve with the categories that match closest being at the top of the curve and the sum of all of the probabilities should equal 1.0. However, that assumes that there is at least 1 category that has enough overlap to be considered a match. What if none of them are a perfect match? What if a pattern exactly matched more than one category?

It seems to me that the values returned in the result array should be a probability of being a match (enough overlapping bits) which is a function of the size of the SDR, the sparsity (number of 1's it has), and the number if 1's that overlap. So if I am understanding this correctly what we want is not a true PDF. The sum of all probabilities in the result array will normally not equal 1.0.

In L246aNetwork.getCurrentClassification() from @fcr 's reference, any category below the threshold is not considered. So if there is no match then all categories will be 0.0 probability. It does not seem to me that it is correctly computing the probability of an SDR being a match either. Perhaps for a given sparsity, it should be something like:

probability = possible_patterns_with_b_overlapping_matches / all_possible_patterns_with_w_1's

If all 1 bits overlapped then the probability would be 1.0, unless there were more than one category learned with the same pattern. As the number of overlaps become smaller the probability tends toward 0. All of the result array could be very small values as a result of random overlap in the case where the given pattern was never learned and had no sematic relation with a real match.

I would like to return a real probability for any SDR of being a match. This means that this will not be a probability distribution function with a bell curve but rather a table of the probability that this subject SDR is a match with the corresponding category.

In the HTM School video's (SDR part 2) there is a formula for the overlap set, the number of patterns with the given sparsity and number of overlaps.

overlap set = (w1/b)((n-w1)/(w2-b))

where b is the number of overlapping bits n is total number of bits w1 is number of ON bits in SDR1 w2 is number of ON bits in SDR2

At this point I get confused.... How do I use this to get a probability that this is a match?

Maybe this is all overkill. All we really want to know is how much of a match do we have and maybe just returning the number of overlaps in the result array is sufficient. If it is below some threshold it is just 0.

Am I making any since? Sorry for the long post....

ctrl-z-9000-times commented 3 years ago

IIRC, Nupic originally had a classifier which was a lot like what you're describing. It used a k-Nearest-Neighbor algorithm to find which learned SDRs were most similar to the test data. It was capable of learning new categories from a single example. But, it had a few issues:

These should be solvable issues! I don't really understand how your solution works, but good luck engineering it.

dkeeney commented 3 years ago

I don't really understand how your solution works

During learning I store only one entry for each category and ON bit combination. So, if there are 40 ON bits for the pattern being learned for a category, there will be 40 entries for the category. If there is more than one pattern learned for a category, the bits are basically ORed.

During infer, I take each bit from the source SDR and use it to index into the multimap. That will locate all stored entries that match with at least one bit (so I don't have to check all learned entries). For each of these I check the rest of the bits and keep track of the number that overlap per category.

All of that seems to be ok. It's just the math for computing the probability of a match from the bit overlap that is giving me the problem.

dkeeney commented 3 years ago

I still don't think using softmax( ) to generate the probability is getting the correct numbers but perhaps it does not matter. If the pattern being matched in the infer( ) has at least some overlaps then the softmax( ) function will show bigger numbers for the matching category. I avoid the problem of the probability when there are none or very few overlaps by declaring it a non-match if the number of overlaps is not above a threshold on at least one category.

That threshold can be set in the constructor, or if left at 0, will be computed to be at least 20% of the active bits.

The returned PDF array will be empty if there is a non-match as a signal to the caller that nothing matched.

dkeeney commented 3 years ago

I tried the OverlapClassifier with MNIST and the results are rather disappointing. The best I could do was 36% while the original SDRClassifier did 94.4%. I suspect ORing all training bits for a category was not the right thing to do.

I am going to put this on the back burner and think about it for a while.

aamir-gmail commented 3 years ago

I have never really liked the SDRClassifier algorithm because it requires a lot of training before it can reliably classify (it uses a classical NN). I have noticed that some users did not provide enough training of the SDRClassifier and as a result assumed it did not work.

I have been playing with the idea of using bit overlap to match up an SDR pattern with patterns that a classifier has previously learned. For the learn( ) function, an SDR pattern to be learned would be added to a map along with its category if it was not already present. If a category was already present we OR the bits to the patterns learned for that cateory.

For the infer( ) function, the SDR pattern would be checked against the patterns already in the classifier map. The match probability would correspond to the number of matching bits. The brute force approach would be to call SDR's getOverlap( ) method for each SDR that has been learned by the classifier. That could be computationally expensive.

I noticed that matching bits converge quite quickly. For an SDR containing about 40 ON bits out of 2000, there was a decent probability of a match with as few as 3-4 bits. If we only checked those that had at least one bit matching it would reduce the search considerably. So, what if during learning, we stored our SDR in a multimap indexed by each bit and its category. Each learned pattern would have 40 entries if there were 40 bits turned on. Indexing by two bit combinations would converge faster but would require 40 X 40 entries in the map for each category which is probably too much space. So lets assume we index only by a single bit.

During infer we take each bit of the pattern to be matched and queried the multimap. For all those that also contained that bit and had not already been checked, we would call the getOverlap( ) function and save the overlap count in a result array. The result array could then be turned into a probability distribution.

As you all know, I am weak on algorithms but what do you think? Is this worth trying?

Please see if this repo does what you are looking for .

https://github.com/aamir-gmail/htm-vision-encoder

Thanh-Binh commented 3 years ago

@aamir-gmail could you share us any document/ paper related your algorithm?

dkeeney commented 3 years ago

@aamir-gmail, I think this is basically what I did and it did not perform very well with the MNIST data. Of course I may have messed up the coding someplace.

Learn: for each sample-category pair, pass sample through encoder, SP and TM or whatever.
take resulting SDR and OR its bits with the previously saved samples for that category with the assumption that most bits would be the same.

Lookup: (for a sample that has been passed through the same sequence of encoder, SP, TM or whatever) for each category look for overlapping bits and accept category with the most overlapping bits.

I think the problem is that during learning the out-liners count the same as bits that were already set due to other samples of the same category. With enough out-liners, any sample will match almost any category. So, rather than ORing the bits I should do some sort of incremental addition of a weight for that bit. That may be what you are doing...not sure.

dkeeney commented 3 years ago

I took another crack at implementing this Classifier using overlapping bits rather than classical AI. This time I kept track of how many training samples for a label that contained each bit. So then during the infer for a pattern it added up the overlapped sample counts rather than just adding 1 for each overlap. This way, the outliners had a very small contribution in deciding if it is a match.

The result was better....up to 52% matched but still no cigar. So I still have more work to do.

dkeeney commented 3 years ago

I did run into a problem however. If I processed the raw images it completes correctly. But when I run the data through SP, it crashes on element 1080 of the training data. That entry is a "1" and it has 45 raw bits in the image. When it comes out of the SP there are 0 bits set. I don't know how that can happen.

Running in Debug mode in MSVC 2019, It crashes in the NTA_CHECK line

  auto sparse_pattern = pattern.getSparse();
  NTA_CHECK(sparse_pattern.size() > 0) << "No active bits in the Pattern being learned.";

But the message text is not displayed. Probably something to do with being in Debug mode.

Anyway, I will check for empty patterns and skip them while training.

breznak commented 3 years ago

Anyway, I will check for empty patterns and skip them while training.

(while training the classifier) yes, that seems reasonable. No SDR should be empty if it reaches the Classifier.

When it comes out of the SP there are 0 bits set. I don't know how that can happen.

The other thing is that it does happen. Might be some "too trivial" settings of SP with insufficient bits, or too early in the training process (weights not setup properly yet?)

ctrl-z-9000-times commented 3 years ago

We should probably change that NTA_CHECK to an NTA_ASSERT...