numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.33k stars 1.56k forks source link

Flatline anomalies in regular period scalar data are not identified #3824

Open rhyolight opened 6 years ago

rhyolight commented 6 years ago

Common flatline anomalies are not detected in many cases by HTM algorithms.

The first report of this is from HTM Forum. See:

flatline chart

Later, it was also reported against HTM.Java in a private message. I have taken all the data from that private message from Parag_Goyal and put it here in this report.

First and foremost, download the flatline-anomaly.zip file to get a reproduction of this issue. It contains data and a nupic program to replicate. See also the attached output from this program (output MT.xlsx) with Excel charts showing anomaly scores and anomaly likelihoods.

screen shot 2018-03-29 at 10 39 00 am screen shot 2018-03-29 at 10 39 07 am

This phenomenon has been reported in NuPIC and HTM.Java, so we may assume it is an algorithmic issue. It could be something to do with how the anomaly scores are calculated, or how anomaly likelihoods are calculated.

This is an open issue, we are aware of it, but are not prioritizing it for work at this time. But we want to report that it exists and someone might be able to figure out what's wrong.

One last note: Eventually we would like to add the dataset in flatline-anomaly.zip to NAB. It is a good dataset that represents a very common anomaly in streaming scalar data. We plan to do this by adding a staging area for new data sets in NAB so we can publish them with our next versioned release.

rhyolight commented 6 years ago

I should mention this is a known issue:

https://github.com/numenta/nupic/blob/3c5c63fb512f70856a3068d0b55fea5f5e0bd944/src/nupic/algorithms/anomaly_likelihood.py#L466-L479

Parag0892 commented 6 years ago

This is not just a flat line issue. If you look at this data set : non-flatline-data.csv.zip data line in mid is not flat. Metrics vary between 1 to 3 but still, the anomaly is not detected

Anomaly region in data:

screen shot 2018-04-05 at 1 11 37 pm

The overall analyzer output:

screen shot 2018-04-05 at 1 08 39 pm
rhyolight commented 6 years ago

Correct, not just a flatline issue, and you can see why in this line of code I quoted above:

if metricDistribution["variance"] < 1.5e-5: