numenta / NAB

The Numenta Anomaly Benchmark
GNU Affero General Public License v3.0
1.93k stars 869 forks source link

Duplicates in machine_temperature_system_failure.csv #376

Open cansubasak opened 3 years ago

cansubasak commented 3 years ago

Duplicates with different observation values are as below.

value
timestamp
2014-01-07 02:00:00 94.423406
2014-01-07 02:05:00 94.698730
2014-01-07 02:10:00 95.332824
2014-01-07 02:15:00 95.079199
2014-01-07 02:20:00 94.881208
2014-01-07 02:25:00 94.563961
2014-01-07 02:30:00 93.430922
2014-01-07 02:35:00 93.729663
2014-01-07 02:40:00 93.192987
2014-01-07 02:45:00 93.967871
2014-01-07 02:50:00 93.397374
2014-01-07 02:55:00 92.855999
2014-01-07 02:00:00 94.139723
2014-01-07 02:05:00 94.111970
2014-01-07 02:10:00 94.638723
2014-01-07 02:15:00 93.270907
2014-01-07 02:20:00 93.890249
2014-01-07 02:25:00 93.396627
2014-01-07 02:30:00 94.199300
2014-01-07 02:35:00 94.125420
2014-01-07 02:40:00 93.530827
2014-01-07 02:45:00 92.784720
2014-01-07 02:50:00 93.254724
2014-01-07 02:55:00 93.656042
subutai commented 3 years ago

These are not duplicates. They are spaced 5 minutes apart.

cansubasak commented 3 years ago

Please check again. Timestamps for the first row (2014-01-07 02:00:00) and the twelfth row (2014-01-07 02:00:00) are same. There are 24 observation points between 2014-01-07 02:00:00 and 2014-01-07 02:55:00. (namely for 1 hour). I coincidentally realized that there are 300 data points for the day 2014-01-07 .

subutai commented 3 years ago

Ah, thank you for pointing out the specifics. We had not noticed that before. Did you see if that happened elsewhere at all?

cansubasak commented 3 years ago

No, I checked all points. For the other days it's correct (with 5 min time steps, without any duplicates).