z3z1ma / target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.
MIT License
28 stars 38 forks source link

When running upsert jobs that are over 5 minutes the states gets moved forward without data being written #97

Closed loveeklund-osttra closed 2 months ago

loveeklund-osttra commented 2 months ago

The _handle_max_record_age() https://github.com/meltano/sdk/blob/6708cb995c68ab6f74d4874dfc8f978c3b054ceb/singer_sdk/target_base.py#L284 Gets called every 5 minutes. It it turn calls drain_all() target-bigquery/target_bigquery/target.py which writes to the target table and writes out a state.

If you are running upsert the target table is a temporary table. As the merge doesn't happen until the end this means the state is out of sync with the content of the "real" target table. This can be problematic and lead to what I would call unexpected behavior if a job for any reason doesn't reach its end.

I created this(https://github.com/z3z1ma/target-bigquery/pull/96) PR for a possible solution using the pre_state_hook.

z3z1ma commented 2 months ago

I also think #96 will resolve this. I will close this for now but we can re-open if behavior persists. Will cut a new pypi release soon.