m-lab / etl-gardener

Gardener provides services for maintaining and reprocessing mlab data.
Apache License 2.0
13 stars 5 forks source link

Delete json row files in etl-mlab-xxx after successfully loading into bigquery #320

Open gfr10598 opened 3 years ago

gfr10598 commented 3 years ago

For tcpinfo, these files are very large, and incur significant costs.

For smaller data types, it might be useful to retain the files, as they can then be reloaded in future if parsing fails for an archive and does not overwrite the corresponding file.

Gardener should verify that the data has been successfully loaded, record the load in an audit trail table, and then delete these files.