numenta / NAB

The Numenta Anomaly Benchmark
GNU Affero General Public License v3.0
1.9k stars 869 forks source link

NAB should be updating and using nupic's getScalarMetricWithTimeOfDayAnomalyParams #231

Closed vitaly-krugl closed 8 years ago

vitaly-krugl commented 8 years ago

NAB PR #206 recently made improvements to its own copy of the model parameters.

Htmengine is relying on nupic.frameworks.opf.common_models.cluster_params.getScalarMetricWithTimeOfDayAnomalyParams() for its model parameters.

NAB should be using and updating nupic.frameworks.opf.common_models.cluster_params.getScalarMetricWithTimeOfDayAnomalyParams instead of maintaining its own copy of the parameters. Similar idea to "Switch over to using the Anomaly Likelihood class in NuPIC” (https://github.com/numenta/NAB/pull/184).

Per email from Subutai:

Subject: Re: HTM Engine questions on mailing list

Vitaly,

Yes, I think that would be better. We should update the params in common_models and NAB should switch over to using it.

-- Subutai

rhyolight commented 8 years ago

:+1:

subutai commented 8 years ago

One update to the note above: before switching NAB over, we should make sure

nupic.frameworks.opf.common_models.cluster_params.getScalarMetricWithTimeOfDayAnomalyParams()

contains the latest parameters from NAB! Is there a NuPIC issue for that?

breznak commented 8 years ago

contains the latest parameters from NAB! Is there a NuPIC issue for that?

:+1: hope to see an example in nupic! I got a bit lost how do I feed the metricData - if I don't have any? I want the best set of params for a generic (black box) usecase. I was under assumption hotgym is sort of that.

vitaly-krugl commented 8 years ago

@subutai - regarding

One update to the note above: before switching NAB over, we should make sure nupic.frameworks.opf.common_models.cluster_params.getScalarMetricWithTimeOfDayAnomalyParams() contains the latest parameters from NAB! Is there a NuPIC issue for that?

The intention behind this issue is to do both things:

  1. update the params in NuPIC; and
  2. switch NAB to using getScalarMetricWithTimeOfDayAnomalyParams.

Since these are tightly-coupled changes, do we need a separate issue in NuPIC for it as well?

vitaly-krugl commented 8 years ago

@breznak, regarding

hope to see an example in nupic! I got a bit lost how do I feed the metricData - if I don't have any? I want the best set of params for a generic (black box) usecase. I was under assumption hotgym is sort of that.

There is an example of getScalarMetricWithTimeOfDayAnomalyParams() use here: https://github.com/numenta/numenta-apps/blob/9a02bf984272721f94b1255b81146841bc890433/htmengine/htmengine/runtime/scalar_metric_utils.py#L80-L84

subutai commented 8 years ago

do we need a separate issue in NuPIC for it as well?

@vitaly-krugl Not sure - up to @rhyolight . As long as both issues get taken care of in the right order, I don't really care!

breznak commented 8 years ago

@vitaly-krugl thank you! I'm still trying to understand the concept - is this useful for me if I have: black box data (any, generic use-case), don't do swarming (actually, does this code just run some small swarming to get "best" params?), I don't use time in my model? What params does this set - encoder, anomaly, all for HTM model..?

vitaly-krugl commented 8 years ago

@breznak, getScalarMetricWithTimeOfDayAnomalyParams is using this file as the template: https://github.com/numenta/nupic/blob/master/src/nupic/frameworks/opf/common_models/anomaly_params_random_encoder/best_single_metric_anomaly_params.json

and augments it with values based on the args of the function.

The static template was initially derived from swarming over IT data, and we found that it worked well for anomaly detection in time series data for HTM for IT (formerly Grok for IT) as well as HTM for stocks ("Taurus").

breznak commented 8 years ago

Thank you @vitaly-krugl !

Just one last checklist :wink: before we freeze these values for https://github.com/breznak/neural.benchmark/issues/14 (CC @wattik please TODO)

@wattik Probably we'll settle with these values, atleast we'll have comparison with NAB for real data, please fix this before running the tests!

chetan51 commented 8 years ago

Hi @breznak, answering the questions I can:

is still better to use plain Scalar where possible

Yes, if you know the min/max and they don't change.

have the epsilon differences (0.10000000000000001) some smart meaning?

No, just rounding effects.

breznak commented 8 years ago

Thanks @chetan51 :+1:

chetan51 commented 8 years ago

Perhaps @subutai can answer the rest?

subutai commented 8 years ago

looks like these values have better foundation than the "hotgym" (NuPIC will switch anyway)

Yes, I believe so. Everyone should start with these parameters unless they have a good reason.

this should be quite good settings even if we use different encoder(s) (diff input layer), right?

Yes, I think these SP and TM parameters are good starting points even if you have a different set of initial encoders.

have you included "tm" implementation to your testing? And still chose (cpp=)TP ?

No we haven't. That would be an interesting thing to test!

BoltzmannBrain commented 8 years ago

@rhyolight is there a nupic issue (and PR) for this?

rhyolight commented 8 years ago

@BoltzmannBrain I don't believe so, no.

BoltzmannBrain commented 8 years ago

The task here is to change NuPIC's best anomaly params to match the current NAB params (which are believed to be generally the best), and then have the NAB algorithm use the NuPIC json directly rather than keeping a copy in the NAB repo. @subutai does this sound correct?

We still would not require NuPIC to install NAB, only if you want to run the HTM detector.

subutai commented 8 years ago

Yes, that sounds correct (in that order too). @vitaly-krugl mentioned this too earlier in this thread. I guess someone just needs to create the NuPIC issue for this? :smile:

BoltzmannBrain commented 8 years ago

Thanks, will do.

BoltzmannBrain commented 8 years ago

https://github.com/numenta/nupic/issues/3046

BoltzmannBrain commented 8 years ago

I've updated the nupic params with nab's, and setup the htm detector in nab to pull them in from nupic. I'm finding differences in the resolutions calculated from numBuckets, resulting in (small) score changes. For example, here is the resolution for "realTraffic/TravelTime_387.csv" under a few setups, the HTM detector using the NAB params...

  1. in NAB, res = 54.3846153846.
  2. "" w/o padding the min-max range by 20%, res = 116.538461538.
  3. in nupic, accessed via nupic.frameworks.opf.common_models.cluster_params.getScalarMetricWithTimeOfDayAnomalyParams(), res = 38.8461538462.
  4. "", specifying only metricData and not the min and max for the get method (so the padding is the std dev), res = 44.9919262275.
  5. "", specifying min and max values, and metricData=[0], res13 = 38.8461538462.

The logic in 4 makes the most sense. @subutai a nice result is it increases the NAB scores :smile: I'll followup with PRs soon.