SDR: Implement ways to measure quality of produced SDR

breznak commented 5 years ago

Relevant classes:

SpatialPooler
SDR
Topology

Why?

when making changes to SP, we don't have ways to meansure the quality of its outputs: SDRs.

Functionality: SDR = sparse distributed representation

sparse:
- [x] min, max active bits in SDR, compared to % size
- [ ] avg distance between active bits in SDR (should be similar for all SDRs), uses Topology
distributed:
- [x] uses activeDutyCycles of the active bits (=cols) to see that all cols are used equally
- [x] information/entropy of the bit/SDR

Implementation:

helper method to SP (SDR?)

Hypothesis:

what is a "quality" SDR?
does higher quality SDR translate to better (how?) results? (in what?)

EDIT: latest update 14/01/2019

Update: Implemented in PR #184

SDR Sparsity Metrics
SDR Activation Frequency Metrics
SDR Average Overlap Metrics
SDR All Metrics Convenience Class

Summary: Ideas which are discussed here but not yet implemented:

[x] Cell death (via method SDR.killCells)
[ ] SDR topology
[ ] SP noise resistance (via method SDR.addNoise, also the example sp_tutorial will demonstraight this)
[ ] SP long term stability
[ ] TM estimate false positive & negative rates
[ ] Test Hypothesises
[ ] Write about how to measure HTM's using these metrics

dkeeney commented 5 years ago

when making changes to SP, we don't have ways to meansure the quality of its outputs: SDRs.

Yes, I agree. Having some sort of measure would be very useful. 👍

ctrl-z-9000-times commented 5 years ago

Great idea!

I would add stats: min/mean/std-dev/max for ac tiveDutyCycles, and then binary entropy which is a single fraction (in range 0-1) which describes utilization.

Ths SDR class has a hook which is called everytime its value is updated, could be useful for this task?

breznak commented 5 years ago

I would add stats: min/mean/std-dev/max for ac tiveDutyCycles

So maybe in 2 ways of implementation; as for the first type, I'd like only metrics that is computed instantly, just from the SDR. To make it simpler (no logic needs to be added to SP), and faster.

entropy .. which describes utilization.

For a single bit, whole SDR, or the SP?

ctrl-z-9000-times commented 5 years ago

We could split these metrics into different methods, and then have a print method which calls all four. Then the min/max can be computed fast and separately, but the print method (which is typically only called once at end of program) can display all of the stats.

class SDR_ActivationFrequency {
    SDR_ActivationFrequency( SDR &dataSource );
    Real min();
    Real max();
    Real mean();
    Real std();
    Real entropy();
    String pretty_print(); // Uses all the metrics.
}

entropy .. which describes utilization.

For a single bit, whole SDR, or the SP?

Entropy is for the activation frequency of the SDR as a whole. Here is my python function for it:

def _binary_entropy(p): // p is an array of floats in range [0, 1]
    p_ = (1 - p)
    s  = -p*np.log2(p) -p_*np.log2(p_)
    return np.mean(np.nan_to_num(s))

Then to scale entropy into range [0, 1] simply divide by the theoretical maximum entropy which is: entropy(mean(activationFrequency)).

ctrl-z-9000-times commented 5 years ago

min, max active bits in SDR, compared to % size

Another good idea, to which I would add mean & std. Min & max tell you about the extremes & outliers, which can be helpful for spotting bugs. Mean & std tell you about its normal operating behaviour.

Yet another interesting metric to track is: Average overlap between consecutive assignments to an SDR. This measures how quickly an SDR changes, sort of like a derivative. I have in past experiments used this to measure the quality of encoders, w/ regards to semantic similarity property. I've also used this metric in experiments with Layer 2/3 cell stability / view-point invariance.

breznak commented 5 years ago

split these metrics into different methods,

Yes, I'd like the metric to provide a mapping to [0, 1], but (also) return the separate stats.

Mean & std tell you about its normal operating behaviour.

First I've thought of "quality" as a one-shot measure of an SDR, you're suggesting to add statistics over the run of the program on the dataset (which is a good thing!) Only if these should be separate? Quality of SDR, and stats of SP. Or keep it together in one.

def _binary_entropy(p): // p is an array of floats in range [0, 1]

And the p here is? activation freq for each column(bit) after N runs?

Yet another interesting metric to track is: Average overlap between consecutive assignments to an SDR

What do you mean by this? If it's overlap between 2 consecutive (any) SDR values produced by SP, that imho has no meaning, as these do not have to be anyhow correlated...?

ctrl-z-9000-times commented 5 years ago

I think a good way to organize all of these would be to give each metric its own class. Then create a class named SDR_Metrics which would gather up all of the metrics into a single easy to use package.

Each metric could follow a common design pattern, such as:

SDR_MetricName {
    void SDR_MetricName( SDR &dataSource, ... );
    Real statistics(); // Min Mean Std Max
    String print();
    void save( ... );
    void load( ... );
}

breznak commented 5 years ago

I'm looking into measuring the following property of an SDR: "Distributed = each bit can be REused in several different contexts(SDRs), and a collection of multiple bits is unique (a SDR)"

We can evaluate sparsity quite well, but this distributed-ness? Would information/entropy over column activations over the run over dataset (too much over-s :D) be enough? In SP we can use (active)DutyCycles as well...

breznak commented 5 years ago

I would like to tyrn this into a paper. The main ideas are:

we can&should measure quality of the encoding (SDRs) - how? What features?
(How) does the quality correlate with good algorithm results? (prediction & anomaly)
Compare quality of (output) encodings of other ML algorithms. (Which? Only sparse representations?)
- sparse auto-encoders
- cortical.io retina (SDRs for NLP)
- biological (BCI data from which regions, retina, ...)

ctrl-z-9000-times commented 5 years ago

And the p here is? activation freq for each column(bit) after N runs?

Yes.

What do you mean by [average overlap]? If it's overlap between 2 consecutive (any) SDR values produced by SP, that imho has no meaning, as these do not have to be anyhow correlated...?

This is only relevant for time-series datasets. The encoder output should have an overlap when the input value is slowly and smoothly moving, which indicates semantic similarity between encoded values. The SP should have very little overlap because it should map similar inputs to distinctive outputs. The column-pooler should have a significant average overlap because it is supposed to do view-point invariance.

ctrl-z-9000-times commented 5 years ago

For reference: I got a lot of ideas for statistics by reading numenta's papers. In their SP paper they describe several ways to measure the quality of their results. IIRC the SDR paper was also useful.

"Distributed = each bit can be REused in several different contexts(SDRs), and a collection of multiple bits is unique (a SDR)"

In this context I think that "distributed" means "decorrelated". You can measure the correlation between two SDRs, and between every pair of SDRs in a set, and then average those correlations together into a single result describing overall quality. In past experiments I've measured correlations between & within labelled categories, which I found useful.

I would like to tyrn this into a paper

Alternatively, this info would be great for our wiki too. It would be helpful for other ppl to understand how to build & debug HTM systems. I have been meaning to write on the htm-community wiki. I've started writing a wiki in my fork of nupic.cpp but its not done yet. I am hoping to turn the wiki into a practical guide for using HTMs. The numenta wiki already has a lot of good material & docs which we should copy into this wiki at some point.

breznak commented 5 years ago

Average overlap between consecutive assignments to an SDR. This measures how quickly an SDR changes, sort of like a derivative. I have in past experiments used this to measure the quality of encoders, w/ regards to semantic similarity property. I've also used this metric in experiments with Layer 2/3 cell stability / view-point invariance.

What would be good datasets to test this?

You can measure the correlation between two SDRs

I'm trying to figure how to eliminate the (error) caused from encoders, which are written by hand. We could use a set of SDRs and just modify them (to have semantically similar, with known difference data), MNIST would be a good example from a practical domain.

Also, would c++, py be a better repo to start this research at?

ctrl-z-9000-times commented 5 years ago

What would be good datasets to test [SP-AverageOverlap]?

This would be useful in conjunction with any encoder. Use artificial data as input so that you can control the rate it changes at, and check that the resulting SDR has a reasonable average overlap. The SP-AverageOverlap class should use an exponential rolling average, so it is possible to get the exact overlap (rather than an average) for testing purposes by setting its parameter to 1.

What would be good datasets to test [Layer 2/3 cell stability / view-point invariance]?

An artificial dataset. Numenta created 3D objects to test this.

In my experiments I used words: I encoded each letter of the alphabet as a random SDR, and fed the two layer network a sequence of words (with whitespace removed). I judged the quality of layers 2/3 by the average overlap, as well as a more detailed analysis of the actual overlaps within & between categories (where each word is a category).

Also, would c++, py be a better repo to start this research at?

IMO C++. I would rather make this repo really good, and then have python bindings.

breznak commented 5 years ago

Related Numenta papers : https://arxiv.org/abs/1601.00720

https://numenta.com/neuroscience-research/research-publications/papers/htm-spatial-pooler-neocortical-algorithm-for-online-sparse-distributed-coding/

https://arxiv.org/abs/1503.07469

https://arxiv.org/abs/1602.05925

(please add more resources)

ctrl-z-9000-times commented 5 years ago

From the SP paper: Two more metrics for the SP, not generic for all SDRs. These metrics depend on an input dataset and prior training, so there is some work required from the user.

Noise resistance
- We could have a method SP.computeNoisy(inputs, outputs, percentNoise) -> noiseResistance which would calculate this metric and return the percent overlap between the clean & noisy results.
Long term stability of inputs -> outputs
- Method SDR.overlap() can help with this.

From "Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory": Both of the following metrics could be methods of TM class.

False positive rate (estimate): needs input SDR-Sparsity
False negative rates (estimate): needs input SDR-Sparsity & percentNoise

Cell death experiments: We could make an SDR subclass which kills a fraction of cells in an SDR and filters them out of its value.

breznak commented 5 years ago

These metrics depend on an input dataset and prior training, so there is some work required from the user.

I figured most of the interesting metrics would be task (dataset) dependant. In a form of a sliding window, as HTM is doing online learning.

Noise resistance

I'd add this under the autoassociative memory experiment, with dropout:

Cell death experiments: We could make an SDR subclass which kills a fraction of cells in an SDR and filters them out of its value.

Also, about this

Cell death experiments:

would not add a subclass, but constructor param float dropoutRatio that kills (=flips) each bit randomly with given chance.

False positive rate (estimate): needs input SDR-Sparsity
False negative rates (estimate): needs input SDR-Sparsity & percentNoise

FP, FN rates: :+1:

breznak commented 5 years ago

Some other hypothesis to verify:

H3: low accumulated SDR quality -> hint to change (running) params of the network (SP, TM,..params); should ignore anomalies (as it implies "I don't understand the problem").
H4: Quality acts as a "confidence measure", orthogonal to anomaly score. Allows us to say: "I'm highly confident this is a contextual anomaly" (=high quality, high anomaly) vs. "anomaly && low quality" = "I'm new to the problem, don't take predictions too seriously" (= we may filter out the anomaly) vs (high quality & low anomaly) -> "don't filter out, just small anomaly, but I'm confident about that"
H5: cummulatice quality drop indicates domain change -> could trigger auto-reset(), param tuning, or just hint the domain change. (ex sine wave switches to stairs patter), find datasets for this.

ctrl-z-9000-times commented 5 years ago

Update: Implemented in PR #184

SDR Sparsity Metrics
SDR Activation Frequency Metrics
SDR Average Overlap Metrics
SDR All Metrics Convenience Class

TODO: This is not critical, but maybe useful? I'd like all the SDR Metrics to have another constructor which does not accept an SDR, instead the user must call Metric.addData( SDR ). This lets the users manage their own data and is a more flexible solution.

UPDATE Metric.addData( SDR ) implemented.

Summary: Ideas which are discussed here but not yet implemented:

[ ] Cell death (via SDR subclass)
[ ] SDR topology
[ ] SP noise resistance
[ ] SP long term stability
[ ] TM estimate false positive & negative rates
[ ] Test Hypothesises
[ ] Write about how to measure HTM's using these metrics

ctrl-z-9000-times commented 5 years ago

does higher quality SDR translate to better (how?) results? (in what?)

I accidentally made a bug in the mnist branch, which resulted in a 2% decrease in accuracy from 95 to 93%. This bug also caused the entropy to drop from ~95% to less than 75%!

htm-community / htm.core