To increase throughput of loading submissions into BigQuery, switch to loading them in big chunks from PostgreSQL, but still using load jobs.
The streaming mechanism is somewhat troublesome in our case, as it needs its buffers flushed before any DELETE or UPDATE operations can be done on the table, there's no force flushing, and it could take up to 90 minutes to have everything flushed. The TRUNCATE operation which would've suited us fine also has problems with this currently. This prevents us from always loading using streaming, as this would break tests which needs to empty the database repeatedly.
Doing this will also help us to move closer to making the BigQuery dataset public, as data in PostgreSQL will be largely de-duplicated, making partitioning of BigQuery more viable, and reducing the cost of queries.
To increase throughput of loading submissions into BigQuery, switch to loading them in big chunks from PostgreSQL, but still using load jobs.
The streaming mechanism is somewhat troublesome in our case, as it needs its buffers flushed before any DELETE or UPDATE operations can be done on the table, there's no force flushing, and it could take up to 90 minutes to have everything flushed. The TRUNCATE operation which would've suited us fine also has problems with this currently. This prevents us from always loading using streaming, as this would break tests which needs to empty the database repeatedly.
Doing this will also help us to move closer to making the BigQuery dataset public, as data in PostgreSQL will be largely de-duplicated, making partitioning of BigQuery more viable, and reducing the cost of queries.