z3z1ma / target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.
MIT License
28 stars 38 forks source link

Timeout parameter not being passed to batch_job #89

Closed AlejandroUPC closed 4 months ago

AlejandroUPC commented 5 months ago

Description

Some of our jobs in BigQuery are failing to load some data with connection aborted errors, as google recommends here, setting a larger timeout value.

According to the repo documentation the parameter timeout:

Setting Required Default Description
timeout False 600 Default timeout for batch_job and gcs_stage derived LoadJobs.

It feels like the timeout parameter should also be used in the BatchJobWorker classes, but apparently this parameter not being passed anywhere when triggering the job in batch_job.py:

                client.load_table_from_file(
                    BytesIO(job.data),
                    job.table,
                    num_retries=3,
                    job_config=bigquery.LoadJobConfig(**job.config),
                ).result()

In contrast, in the gcs_stage.py:

            client.load_table_from_uri(
                self.uris,
                self.table.as_ref(),
                timeout=self.config.get("timeout", 600),
                job_config=bigquery.LoadJobConfig(**self.job_config),
            ).result()

Should we be able to pass also the timeout variable to the batch_job or is the documentation wrong (happy to open a MR if the first).