googlegenomics / gcp-variant-transforms

GCP Variant Transforms
Apache License 2.0
134 stars 55 forks source link

VCF to BQ Timeout Issue #699

Closed adaykin closed 2 years ago

adaykin commented 3 years ago

Hello, we recently ran into an issue during the VCF to BQ process.

I'd like to know if there's anything that we can do differently to avoid this error since it causes a pretty big pain point for us when the VCF to BQ process fails.

The command we're running is: /opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py --setup_file ./setup.py --project --input_file --representative_header_file --output_table --sample_lookup_optimized_output_table --sample_name_encoding WITHOUT_FILE_PATH --temp_location --job_name <job_name --runner DataflowRunner --region us-central1 --append True --update_schema_on_append --use_1_based_coordinate --keep_intermediate_avro_files --experiments shuffle_mode=service

Before the exception we received a message in the logs that says "ERROR:root:Something unexpected happened during the loading rows to sample optimized table stage: Operation did not complete within the designated timeout. WARNING:root:Since tables were appended, added rows cannot be reverted. You can utilize BigQuery snapshot decorators to recover your table up to 7 days ago. For more information please refer to: https://cloud.google.com/bigquery/table-decorators Here is the list of tables that you need to manually rollback:"

The stack trace is:

Traceback (most recent call last): File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target return target() File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/api_core/future/polling.py", line 84, in _done_or_raise raise _OperationNotComplete() google.api_core.future.polling._OperationNotComplete

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/api_core/future/polling.py", line 104, in _blockingpoll retry(self._done_or_raise)() File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func on_error=on_error, File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/api_core/retry.py", line 206, in retry_target last_exc, File "", line 3, in raise_from google.api_core.exceptions.RetryError: Deadline of 600.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.cloud.bigquery.job.QueryJob object at 0x7fba110daf28>>), last exception:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 643, in raise e File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 631, in run() File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 626, in run raise e File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 621, in run known_args.sample_lookup_optimized_output_table) File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/libs/partitioning.py", line 277, in copy_to_flatten_table self._copy_to_flatten_table(full_output_table_id, cp_query) File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/libs/partitioning.py", line 170, in _copy_to_flattentable = query_job.result(timeout=600) File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 3230, in result super(QueryJob, self).result(retry=retry, timeout=timeout) File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 835, in result return super(_AsyncJob, self).result(timeout=timeout) File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/api_core/future/polling.py", line 125, in result self._blocking_poll(timeout=timeout) File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 3126, in _blocking_poll super(QueryJob, self)._blocking_poll(timeout=timeout) File "/opt/gcp_variant_transforms/venv3/lib/python3.7/site-packages/google/api_core/future/polling.py", line 107, in _blocking_poll "Operation did not complete within the designated " "timeout." concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.

moschetti commented 3 years ago

Can you check the BigQuery logs to see if there is information about the timeout? http://console.cloud.google.com/logs You can filter by time and resource type to see if there are any BigQuery errors or warnings.

moschetti commented 3 years ago

Following up here: the cause of this is the 600s timeout of the query that generates the sample-optimized table, which is failing on lower (larger) chromosomes in datasets with large number of samples. This is due to #607 where the sample-optimized table does not support --append behavior.

Workaround is create a container image with a longer timeout. Long-term fix will be tracked in #607.

lawrenae commented 2 years ago

this is fixed with PR #713

pgrosu commented 2 years ago

@lawrenae Did you mean to reference #715 instead?