z3z1ma / target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.
MIT License
27 stars 36 forks source link

Temporary table seems to be destroyed and recreated in the middle of a run when table expiry is reached, resulting in lost data #99

Open TrishGillett opened 3 weeks ago

TrishGillett commented 3 weeks ago

Currently, when a temporary table is created, it is set to expire one day in the future.

I've run into a problem using this target in this scenario:

Observations:

There are a couple things here that could be opportunities for enhancements:

I'm open to trying to contribute towards these changes, but would appreciate getting alignment from a maintainer on the approach first. 🙏

AlejandroUPC commented 2 weeks ago

Mmm I think the best case here is to be able to configure the expiration date and set it to a very high value (one you're sure won't expire) and maybe also a param to ensure deletion after completion of the temp table (when all the sinks are drained)? Would this work?

TrishGillett commented 3 days ago

Hey @AlejandroUPC! I think that could be part of the answer, although personally I would also love to see runs fail loudly in the case where the table disappears mid-run. That would be reassuring for me since I could set the limit to something that I think should be long enough (as opposed to something absurdly long) and trust that I'll be notified if it turns out to be too short. It would also be useful to other users since they'd be informed if they're encountering this issue and need to use the (as yet hypothetical :P) custom time limit setting.

I'm picturing something like, could we make it so the temp table is created before extraction begins, and anytime we intend to write to it we could do an existence check first and fail the run if it doesn't exist? (Apologies if my mental model is off here, I am new to the internals of this target and making some guesses.)