mozilla / fxa-activity-metrics

A server for managing the Firefox Accounts metrics database and pipeline
1 stars 3 forks source link

Nonsense timestamps in flow_events table #65

Closed rfk closed 7 years ago

rfk commented 7 years ago

Something about this timestamp seems a little off...

fxa=# select max(timestamp) from flow_events;
         max
---------------------
 2064-06-07 19:39:16
(1 row)

fxa=#
rfk commented 7 years ago

The flow_metadata table looks OK though:

fxa=# select max(begin_time) from flow_metadata;
         max
---------------------
 2017-03-20 23:59:57
(1 row)
philbooth commented 7 years ago

Oh yeah. I think it's just that day from looking at the logs:

IMPORTING 198 DAYS OF DATA
IMPORTING 2017-03-20
  MIN TIMESTAMP 1489963273
  MAX TIMESTAMP 2980093156
SETTING METADATA FOR 2017-03-20
IMPORTING 2017-03-19
  MIN TIMESTAMP 1489875783
  MAX TIMESTAMP 1489967999
SETTING METADATA FOR 2017-03-19
IMPORTING 2017-03-18
  MIN TIMESTAMP 1489789311
  MAX TIMESTAMP 1489881599
SETTING METADATA FOR 2017-03-18

Sorting it out now.

The flow_metadata table looks OK though.

Yeah, that's a result of the extra limiting by timestamp those queries do when inserting/updating. I may do something similar for flow_events in light of this, although it would also be good to track down where this date came from because I thought our validation in the app had a put a stop to this nonsense.

philbooth commented 7 years ago

...I thought our validation in the app had a put a stop to this nonsense.

Oh dear, this is another classic pb poorly-thought-through unintended side-effect I'm afraid.

I added the performance flow events for train 82 and to send the timing I overloaded the flow_time field. I remember congratulating myself on how clever this idea was at the time. But two things have gone wrong here it seems:

  1. For some reason, the performance numbers are unfeasibly big some of the time.
  2. I do all the performance stuff after the step that validates flow_time.

I'll mitigate in the SQL for now and fix in the content server separately.