mozilla / docker-etl

Collection of dockerized ETL jobs managed by data engineering.
Mozilla Public License 2.0
19 stars 15 forks source link

fix(fxci): batch rows when inserting into BigQuery #265

Closed ahal closed 3 months ago

ahal commented 3 months ago

It turns out there's a 10MB limit when inserting data into BigQuery, and we can sometimes exceed that. The chunk size is set to 25000 because ~50k rows was about 11MB of json, so halving that seems safe.

This also upgrades to Python 3.12 as that's when itertools.batched was introduced. Possibly a silly reason to upgrade, but why not.

Checklist for reviewer: