Today gardener accepts a configuration that specifies a start date, source buckets, experiment and datatype names, and the target bigquery dataset and table name.
But, the internal logic of the v2 pipeline ignores the target field, uses static tmp_ and raw_ dataset prefixes, and performs steps that are not configurable (JOINs) that creates constraints on what target table names are used in practice.
This has impaired our ability to be agile in at least two cases, more will come in time.
experimental pcap parser with new schemas without interfering with other sandbox deployments.
experimental annotation parsing from the synthetic annotation export process.
What we did was use the standard configuration. Ideally, we would have been able to specify an alternate target table and gardener would have "just worked" with that. For example:
If gardener configuration allowed this degree of flexibility, and the parsers honored the output target for jobs sent by the gardener, then "versioned tables" could be implemented simply as a configuration here. For example:
Today gardener accepts a configuration that specifies a start date, source buckets, experiment and datatype names, and the target bigquery dataset and table name.
For example:
But, the internal logic of the v2 pipeline ignores the target field, uses static
tmp_
andraw_
dataset prefixes, and performs steps that are not configurable (JOINs) that creates constraints on what target table names are used in practice.This has impaired our ability to be agile in at least two cases, more will come in time.
What we did was use the standard configuration. Ideally, we would have been able to specify an alternate target table and gardener would have "just worked" with that. For example:
This cannot work today because gardener hard codes the dataset prefix: e.g. "raw_*" - https://github.com/m-lab/etl-gardener/blob/master/cloud/bq/ops.go#L162
These should be inferred.
If gardener configuration allowed this degree of flexibility, and the parsers honored the output target for jobs sent by the gardener, then "versioned tables" could be implemented simply as a configuration here. For example: