It turns out there's a 10MB limit when inserting data into BigQuery, and we can sometimes exceed that. The chunk size is set to 25000 because ~50k rows was about 11MB of json, so halving that seems safe.
This also upgrades to Python 3.12 as that's when itertools.batched was introduced. Possibly a silly reason to upgrade, but why not.
Checklist for reviewer:
[ ] Commits should reference a bug or github issue, if relevant (if a bug is
referenced, the pull request should include the bug number in the title)
[ ] Scan the PR and verify that no changes (particularly to
.circleci/config.yml) will cause environment variables (particularly
credentials) to be exposed in test logs
[ ] Ensure the container image will be using permissions granted to
telemetry-airflow
responsibly.
It turns out there's a 10MB limit when inserting data into BigQuery, and we can sometimes exceed that. The chunk size is set to 25000 because ~50k rows was about 11MB of json, so halving that seems safe.
This also upgrades to Python 3.12 as that's when
itertools.batched
was introduced. Possibly a silly reason to upgrade, but why not.Checklist for reviewer:
.circleci/config.yml
) will cause environment variables (particularly credentials) to be exposed in test logs