m-lab / mlab-vis-pipeline

M-Lab Visualization Dataflow pipelines for transforming ndt.all into the needed aggregation tables in bigtable.
2 stars 4 forks source link

BigTable transfer queries can work on reduced input data set #33

Open vlandham opened 8 years ago

vlandham commented 8 years ago

Currently, the bigtable transfer pipeline computes aggregations and updates data for every time point (2009 - whatever the newest data is).

While this is great when the bigtable tables are empty, for the general use case of updating the tables with new data, this is unnecessary as only subsets of the bigtable tables' data will change.

Here are general recommendations for how we could winnow down the time range for different table sets.

Input time value

The input time value should be the day of the last day currently in the bigTable tables.

Search and List Tables

Should be updated with the full time period every time

Year tables

Here we want to run the last 2 years of data.

Should take input time value and subtract 2 years, then start January 1st of that year.

Example:

if last data date is 2016-10-15, then year date should be 2014-01-01

Month tables

Here we want to run the last 2 months of data.

Day tables

Here we want to run the last 30 days of data.