Prepare for mozlog 2 - Githubissues

mozilla / fxa-activity-metrics

A server for managing the Firefox Accounts metrics database and pipeline

1 stars 3 forks source link

Prepare for mozlog 2 #99

Closed philbooth closed 5 years ago

philbooth commented 6 years ago

As part of the move to mozlog 2, we want to make some changes to these import scripts:

Change the entry point to handle Lambda message objects.
Add whatever concurrency is needed to keep Lambda happy.
Stop assuming that one day is the atomic payload size (so Redshift updates at the same frequency as Amplitude).

While doing that, we may or may not port them to node, depending on how stuff pans out.

Once it's ready, we plan to run both pipelines side-by-side against the current mozlog 1 format. Only when we're happy that they are the same will we flip the code in the content server over to mozlog 2.

philbooth commented 6 years ago

/cc @jbuck

jbuck commented 6 years ago

After a quick look at the Kinesis Firehose docs, I don't think we should actually make the first two changes on this list:

Change the entry point to handle Lambda message objects.

Add whatever concurrency is needed to keep Lambda happy.

Stop assuming that one day is the atomic payload size (so Redshift updates at the same frequency as Amplitude).

Basically, Kinesis Firehose will save to S3, then run the redshift COPY to a single table. All we should need to do is modify the scripts so that they can work with a single staging table, and then we're good!

philbooth commented 6 years ago

That's great, thanks @jbuck!

philbooth commented 6 years ago

Stop assuming that one day is the atomic payload size (so Redshift updates at the same frequency as Amplitude).

Basically, Kinesis Firehose will save to S3, then run the redshift COPY to a single table. All we should need to do is modify the scripts so that they can work with a single staging table, and then we're good!

Fwiw, I made a start on this reduced-functionality script in #100, see kinesis_flow_events_2.py in that PR. Format/structure will probably need to change for actual integration with Kinesis, but this works for running from the command line. See kinesis_flow_events_1.py in the same PR for the schema of the staging table and the CSV file.

philbooth commented 6 years ago

Adding @jbuck's face to this to carry on with while I'm away. Related flow import PR is in #100.

vladikoff commented 6 years ago

from mtg: come back to this in 110

philbooth commented 6 years ago

IIUC from last night's meeting, this probably shouldn't be in next any more. Moving to backlog.

vladikoff commented 5 years ago

Let's reopen if this comes up again