Closed breznak closed 5 years ago
from https://github.com/htm-community/nupic.cpp/pull/291#issuecomment-468947295 @Thanh-Binh
think we need decode() because it is more deterministic and WITHOUT learning. SDRClassifier is useful in the general case but it must be learn before use. From the point of quality, I prefer to use decode() firstly.
agree, proble with Encoder.decode()
is you don't have SP.decode() to match that :
this is because SP does not implement decode anyway, so the only thing we can effectively decode is an encoding obtained from the encoder in the first place.
from https://github.com/htm-community/nupic.cpp/pull/291#issuecomment-468948289 @ctrl-z-9000-times
Every encoder has a different decode process, it's implementation specific.
It's non-trivial to implement decode methods.
true, but not valid. We need to implement encode + decode for each encoder (we have it in #291 )
To the best of my knowledge, none of the other algorithms support decode or inverse compute methods.
I think this is a crucial point!
The SDR Classifier works pretty well.
do we have data, your experience with that, that SDRClassifier is sufficient?
@breznak use of top-down Processing is a understandable method, But SP and TP work on sparse data so that it is very difficult to reconstruct the input values. As far as I know, SDRClassifier can not decode or reconstruct any ND-data at least 2D-images
As far as I know, SDRClassifier can not decode or reconstruct any ND-data at least 2D-images
this would be a good argument for decode! Thank you. I'll have to look at SDRClassifier details.
On the other hand, speaking of current encoders (in #291 ) none is for image, all (most) for scalar values. We can make exception (keep decode) for:
The SDR Classifier can only decode real numbers and categories.
The following python classes support the decode
method:
There are no C++ classes which support the decode
method.
The following python classes support the topDownCompute
method:
The following C++ class supports the topDownCompute
method:
BacktrackingTM.topDownCompute()
:// For now, we will assume there is no one above us and that bottomUpOut
// is simply the output that corresponds to our currently stored column
// confidences. Simply return the column confidences
Has the following implementation of topDownCompute
. Notice that:
This method has no tests!
def topDownCompute(self, topDownIn=None):
"""
(From `backtracking_tm.py`)
Top-down compute - generate expected input given output of the TM
@param topDownIn top down input from the level above us
@returns best estimate of the TM input that would have generated bottomUpOut.
"""
output = numpy.zeros(self.numberOfColumns())
columns = [self.columnForCell(idx) for idx in self.getPredictiveCells()]
output[columns] = 1
return output
Although some encoders can decode without any prior training, other simply can not. Encoders which have an infinite range of input values will need to first see all of the values which they encode for before they can decode them. This includes the RDSE
, SDRCategoryEncoder
, and CoordinateEncoder
.
I don't think it is possible to go back in time, and from a set of cellular activations compute the previous state of the network. The fundamental problem with decoding is: that it's impossible to know which of many segments caused a prediction or activation. This logic applies to the TM & CP.
There is some support for the decoding however neither the SP nor the TM support it. The one algorithm which does support it is clearly incomplete and probably incorrect. I second Breznak's proposal # 1. I'm going to submit a PR for a C++ encoder base class which does not have a decode method.
Nice research. Thanks. I agree that we should not support decode. Our brains don't do decode...at least not that way.
There are no C++ classes which support the decode method.
the c++ encoders in #291 do provide it, that's why I've opened this discussion.
[on topDownCompute] The algorithm does not make sense..
agree, the "methods" are probably broken, or completely off.
t. I second Breznak's proposal # 1. I'm going to submit a PR for a C++ encoder base class which does not have a decode method.
Actually, that's opposite to what I was trying to say. my Proposal is not either, or; but all of the steps.
implement topdown compute for temporal components (TM, TP), as those are stateful (output SDR differs on position in the sequence)
This is true:
The fundamental problem with decoding is: that it's impossible to know which of many segments caused a prediction or activation. This logic applies to the TM & CP.
I agree to remove topDownCompute (and get rid of existing, broken implementations).
Then we have a new problem
in my example: sequences me-bite-dog (yummy?)
(activates cell C1), while dog-bites-me (ouch!)
corresponds to cell C2.
In columns, both examples would be {me, dog, bite}
(unordered!).
Assumption: there's much more contextual sequeneces (ordered), compared to just the "set of objects" (unordered set).
Anomaly assumes columns exactly for the reason described in example, as we have no topology, there's no meaning of "nearby cells (by cell idx) share more common input potential pool/are from the same neighbourhood".
VectorHelpers::cellsToColumns()
is incorrect and should be removed! Assumption: there's much more contextual sequeneces (ordered), compared to just the "set of objects" (unordered set).
- [ ] we need columns&topology for stable anomaly on columns.
Classifier is now able only map SDR->scalar values. We need to be able to reconstruct generic objects (similar as autoencoders)
Which makes me question
The fundamental problem with decoding is: that it's impossible to know which of many segments caused a prediction or activation. This logic applies to the TM & CP.
True, it's not fully deterministic!
This is definitely not perfect, but should be better approximation than nothing. The result of decoding should be a union of SDR that'd likely cause the current activation.
This is definitely not perfect, but should be better approximation than nothing.
I wonder how much informaton is lost in such transformation (from TM -> SP). Then, further decoding is not needed (assuming we're not in hierarchy). Classifier should learn well the SDR->obj mapping. (no decode for encoders, SP needed).
Classifier is now able only map SDR->scalar values.
If we implemented decode for SDR, Encoders; Classifier would not be needed. And we could reconstruct any object.
Missing concept of Columns in TMs, no Topology
There are two concepts of columns: mini-columns and macro-columns.
So probably method VectorHelpers::cellsToColumns() is incorrect
Instead of speculating, you should make some simple tests for the anomaly class. There are some examples which I think would be informative in the nupic.py
repo at nupic.py/tests/integration/nupic/algorithms
. In fact, I'd like for all of those tests to make their way into this repo, along with the tests/regressions
directory.
True, it's not fully deterministic!
No, it is deterministic because for the same inputs it always gets the same results. It is not invertible because it discards a lot of data so it is impossible to work backwards.
No one responded to the request for comment about removing the decode & topDownCompute functionality. I'm going to close this issue. We will not be implementing the things described here.
I wanted to ask a question and noticed this topic. So there is no way in htm.core python bindings to reconstruct the input from SDR as in https://github.com/tehtechguy/mHTM/blob/master/src/region.py#L953?
Hi dicza,
You can always access the synapse data and decode it, but it is not built into the library. I think the best way to do these sorts of things is to train a classifier to translate from SDR to the input category.
@ctrl-z-9000-times could you please point to examples where SDRClassifier is trained to reconstruct input images?
I'm trying to adapt https://github.com/htm-community/htm.core/blob/master/py/htm/examples/mnist.py such that Classifier learns input images. But classifier.infer()
always outputs an array of shape 2 instead of 784 after I've changed the learning to sdrc.learn(columns, enc.dense.flatten())
I should have posted it in the discussion forum.
The SDR classifier outputs a category, not an image.
Decoding, inverse of compute(), aka topdown compute
This idea is about how we should properly obtain original value from SDR, how to "decode".
Methods
There are 2 (3) methods doing decoding:
Pipeline
[any real world object] -> Encoder - > [binary vector] -> SpatialPooler -> [SDR] -> TM (TP) -> [SDR with active + predicted cells] -> CP -> [SDR] (Anomaly-> [scalar 0.0..1.0] / Classifier )
Methods
1. Classifier
currently used approach uses Classifier, which can be used anywhere in the pipeline and learns association between SDR/vector and [object], which it can later infer.
2. Top-down compute
If inverse compute is implemented, Classifier is not necessary and we could replace direction in the pipeline and get back to original value from anywhere in HTM processing.
Current Status
Classifier is used on assigning value [object] to SDR (typically from SP or TM!)
Proposal
[object] -> hash() -> [hash] -> RDSE -> [SDR] = SDR_hash
SP_assoc.compute(SDR2) -> [SDR2 with assumed SDR_hash' portion] -> UniversalEncoder.decode(SDR_hash')