Closed philbooth closed 5 years ago
/cc @jbuck
After a quick look at the Kinesis Firehose docs, I don't think we should actually make the first two changes on this list:
- Change the entry point to handle Lambda message objects.
- Add whatever concurrency is needed to keep Lambda happy.
- Stop assuming that one day is the atomic payload size (so Redshift updates at the same frequency as Amplitude).
Basically, Kinesis Firehose will save to S3, then run the redshift COPY to a single table. All we should need to do is modify the scripts so that they can work with a single staging table, and then we're good!
That's great, thanks @jbuck!
- Stop assuming that one day is the atomic payload size (so Redshift updates at the same frequency as Amplitude).
Basically, Kinesis Firehose will save to S3, then run the redshift COPY to a single table. All we should need to do is modify the scripts so that they can work with a single staging table, and then we're good!
Fwiw, I made a start on this reduced-functionality script in #100, see kinesis_flow_events_2.py
in that PR. Format/structure will probably need to change for actual integration with Kinesis, but this works for running from the command line. See kinesis_flow_events_1.py
in the same PR for the schema of the staging table and the CSV file.
Adding @jbuck's face to this to carry on with while I'm away. Related flow import PR is in #100.
from mtg: come back to this in 110
IIUC from last night's meeting, this probably shouldn't be in next
any more. Moving to backlog
.
Let's reopen if this comes up again
As part of the move to mozlog 2, we want to make some changes to these import scripts:
While doing that, we may or may not port them to node, depending on how stuff pans out.
Once it's ready, we plan to run both pipelines side-by-side against the current mozlog 1 format. Only when we're happy that they are the same will we flip the code in the content server over to mozlog 2.