Bad timestamps in the flow data for 2017-12-18

philbooth commented 6 years ago

Just saw this in the log for the flow import:

2017-12-18
  COPYING CSV
  MIN timestamp -7467047247
  MAX timestamp 1513688592

Not sure what effect if any that's had on the data in Redshift yet, digging in to it now.

philbooth commented 6 years ago

It seemed like that day imported way quicker than any of the other days I've watched come in, which concerned me at first. But then it looks like there are plenty of events, so maybe I imagined it:

fxa=# select timestamp::date, count(*)
fxa-# from flow_events
fxa-# where timestamp >= '2017-12-16'
fxa-# and timestamp < '2017-12-21'
fxa-# group by 1
fxa-# order by 1;
 timestamp  |  count
------------+----------
 2017-12-16 | 24659955
 2017-12-17 | 23040327
 2017-12-18 | 34078247
 2017-12-19 | 31106960
 2017-12-20 | 29960373
(5 rows)

philbooth commented 6 years ago

Closing this, things seem fine in Redshift and the raw CSV for that day is bigger than the available space on my machine, so it's tricky to have a look at the raw data. (I did kick off a grep on the server an hour ago, still waiting for it to finish)

mozilla / fxa-activity-metrics

Bad timestamps in the flow data for 2017-12-18 #96