Closed fergalbyrne closed 8 years ago
@scottpurdy Please take a look at this and #2758.
@rhyolight it might make more sense to pull this out into a subclass if the backwards compatibility is a problem. We can look at the pros and cons using the HTM.java version.
@subutai I think it is important to find out why Jonathan was getting these midnight anomalies, yet we have never seen them before in any of our data set analyses.
I'd expect this data set to be using the time-of-day encoding for the timestamp, which should shift the bits throughout the day with no significant change at midnight. Does the statement that it shifts the bits at midnight mean you are using the day-of-week encoder instead?
@scottpurdy If @JonnoFTW's code is still up to date, you can see here that he is not using a time-of-day encoder, only hour-of-week.
{
'fieldname': u'timestamp',
'name': u'timestamp_hourOfWeek',
'hourOfWeek': 21,
'type': 'DateEncoder'
}
@subutai I think it is important to find out why Jonathan was getting these midnight anomalies, yet we have never seen them before in any of our data set analyses.
I agree with Scott: the time-of-day encoder should be used. There is no dramatic "shifts all its bits at midnight every night" with that encoder - it is a circular encoder. If you use hour-of-week then every day has a different encoding. Midnight Tuesday will be treated differently from midnight Wednesday, etc. It will thus need to see several weeks of data before it can learn that pattern.
I would recommend starting with the same parameters for time-of-day that we used in NAB:
https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/modelParams/model_params.json
@scottpurdy @subutai Would you suggest that this issue be closed in light of this explanation? It still seems odd to me that these anomalies should present themselves when using only an hour-of-week encoder.
@rhyolight @subutai @scottpurdy @fergalbyrne Hi guys,
Please confirm that he wasn't using a time of day encoder when he encountered this problem originally, before closing this? These comments only have merit if he was in fact using a day of week encoder instead of a time of day encoder?
He's using hourOfWeek on a fork I created by adding a new encoder (a morph of timeOfDay) which smoothly shifts over the week as timeOfDay does. He's eliminated all the midnight anomalies shown at the top of this issue using this small change. For the record, he originally used timeOfDay and dayOfWeek because they were in the hotgym example. The use of dayOfWeek created the anomalies every midnight as seen in the plot. He's now saying these are gone with this patch.
@jonnoftw Can you clear this up by telling us exactly what DateEncoder parameters were used that generated the chart at the top of this issue? Were these anomaly scores generated by the current codebase at https://github.com/JonnoFTW/htm-models-adelaide (master branch)? If not, please point us to the fork so we can inspect the model params.
@rhyolight They can't be. @jonnoftw merged @fergalbyrne's changes into his repo to experiment with the new code. You want to know what he was using originally.
@cogmission I want to see the model params that were used to create the model that generated the anomalies in the chart shown above, whatever fork/branch it is.
@rhyolight exactly... but not the current ones (I'm saying)
@subutai this is a replacement for dayOfWeek not timeOfDay. The problem is that dayOfWeek changes all its bits at midnight. If you are reading 2400 rows per day, you can't learn to predict the one time in 2400 when all the dayOfWeek bits change, and you'll get anomalies when nothing has happened.
@subutai What @fergalbyrne is saying (which he still hasn't said), is that @jonnoftw was using both TimeOfDay and DayOfWeek originally, and now he is using TimeOfDay and HourOfWeek. That's correct right @fergalbyrne ?
Correct. And his midnight anomalies have now vanished.
@fergalbyrne and @cogmission thanks for the explanation. That was helpful. The title of this issue was confusing me!
The problem is that dayOfWeek changes all its bits at midnight.
This is not always true (depends on resolution) but I see your point. dayOfWeek doesn't move continuously as you go into the evening and onto the next day. If timeOfDay by itself is insufficient, then something like hourOfWeek could be useful. I don't have a problem adding an hourOfWeek encoder to NuPIC - it's a good idea.
We need to add tests too. We don't want to add code without tests.
Thanks all. I just wanted to confirm the premise of this issue before moving to the PR.
@rhyolight I've updated my code to use the custom encoder in my github repo. The original params were:
def get_sensor_encoder(name):
return {
'fieldname': name,
'name': name,
'clipInput': True,
'minval': 0.0,
'maxval': max_vehicles,
'n': 50,
'w': 21,
'type': 'ScalarEncoder'
}
def get_time_encoders():
return [{
'fieldname': 'timestamp',
'name': 'timestamp_dayOfWeek',
'type': 'DateEncoder',
'dayOfWeek': (21, 1)
}]
The updated params with the new results using @fergalbyrne's code were (same sensor encoding as before):
def get_time_encoders():
return [{
'fieldname': u'timestamp',
'name': u'timestamp_hourOfWeek',
'hourOfWeek': 21,
'type': 'DateEncoder'
}]
Here's the new results image (it should be noted that this was after 126 days of training data at 5min intervals):
Thanks @subutai. I have an idea which will make adding the hourOfDay easier - just replace the dayOfWeek encoder's input with the non-rounded number of days since 00:00 Monday. What's important is that we have a scalar changing smoothly, the hourOfWeek is then redundant, so backwards compatibility is not an issue.
The only thing needing changing are tests expecting the old encodings.
Does anyone else see a problem with the fact that such a simple addition cannot be made to NuPIC without impacting downstream code and tests? This is what unit tests are made for, but making the actual change should not be this involved?
On Thu, Nov 19, 2015 at 2:50 AM, Fergal Byrne notifications@github.com wrote:
The only thing needing changing are tests expecting the old encodings.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2757#issuecomment-157991961.
With kind regards,
David Ray Java Solutions Architect
Cortical.io http://cortical.io/ Sponsor of: HTM.java https://github.com/numenta/htm.java
d.ray@cortical.io http://cortical.io
@cogmission The required changes are not extensive, so I guess not.
@rhyolight the fix was already there, just commented out. Had to adjust one test (prediction changed very slightly due to smoother changing input). This renders the hourOfWeek redundant and just performs better. No client code changes.
@subutai - Do you think it is ok to change the day-of-week encoder to have the hour-of-week behavior or should the hour-of-week be a new encoder option?
@fergalbyrne @scottpurdy This sounds like a nice change. Can we just also verify that hot gym prediction accuracy didn't suffer? Possibly its encoder params need to be adjusted slightly. Other than that (and one minor comment in the PR) it's :+1: from me.
@subutai I get (very slightly) better results than any other config on the hourly csv (using equivalent htm.java) using dayOfWeek (with several bits on) as well as timeOfDay. I can't build locally, so can't check hotgym on NuPIC. @scottpurdy the dayOfWeek has the same encoding, except it moves one bit at a time instead of jumping a day's worth of bits at midnight. The SP is better able to track that and client code is unaffected.
@fergalbyrne - Yes I understand that. But prediction and classification problems may benefit from the day of week encoding being the same in the morning and at night. I'm not sure if that is practically something we need to worry about now though.
@scottpurdy the original problem was the opposite of that, the input at 11:59 was completely different from that at 00:01. Morning and evening encodings for the same day share half the bits using the default radius, so the case you describe is not problematic.
Morning and evening encodings for the same day share half the bits using the default radius, so the case you describe is not problematic.
It very well could be problematic that half the bits are different. But I'm not sure that is is something that we will actually run into.
@JonnoFTW is reporting big anomalies across many of his sensors at midnight every night (see http://puu.sh/loHhg/2af0a99560.png for a plot). There is no anomalous traffic to explain this, and it repeats every night. He is using the standard date encoder, which shifts all its bits at midnight every night. This is causing a dramatic enough shift in the SP output that the columnar representation cannot be predicted by columns over-sensitive to day-of-week bits.
The fix is to add an hourOfWeek encoder to the DateEncoder, which is just a copy of the timeOfDay encoder with max set to 168 instead of 24. I'll add this in and do a PR.