m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Add support for storage.LocalWriter #991

Closed stephen-soltesz closed 3 years ago

stephen-soltesz commented 3 years ago

This change adds a new output mode to the ETL worker (aka parser) to support writing to local files. The storage.LocalWriter type implements the row.Sink interface. With this change there are now three output modes: "bigquery" (direct, original mode), "gcs" (for bq loads from gcs), "local" (new, added for local development).

The "local" output mode wites files like "gcs" mode (i.e. JSONL named after the archive) but to a local directory specified with the new -output_dir flag.

The LocalWriter is expected to help simplify local development of the etl worker, e.g. for a new datatype by removing some dependencies on BigQuery, table schemas, and updates.

Tested using unit tests and running locally using:

~/bin/etl_worker -prometheusx.listen-address :9991 -service_port :8081 -output_dir ./output -output local  -gardener_host localhost

This invocation is not included in the README yet b/c it is not repeatable. Because this is an exclusive option with other default, this should not impact current deployments.

This change is Reviewable

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 6444


Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/etl_worker/etl_worker.go 0 2 0.0%
<!-- Total: 55 57 96.49% -->
Totals Coverage Status
Change from base Build 6436: 0.3%
Covered Lines: 3523
Relevant Lines: 5615

💛 - Coveralls