trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Flaky BigQuery SLA issue with 500 Internal Server Error #17605

Open ebyhr opened 1 year ago

ebyhr commented 1 year ago
Error:  io.trino.plugin.bigquery.TestBigQueryAvroConnectorTest.testDataMappingSmokeTest[date:DATE0001-01-01](10)  Time elapsed: 5.739 s  <<< FAILURE!
io.trino.testing.QueryFailedException: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support.
    at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:122)
    at io.trino.testing.DistributedQueryRunner.executeWithQueryId(DistributedQueryRunner.java:493)
    at io.trino.testing.QueryAssertions.assertDistributedUpdate(QueryAssertions.java:107)
    at io.trino.testing.QueryAssertions.assertUpdate(QueryAssertions.java:63)
    at io.trino.testing.AbstractTestQueryFramework.assertUpdate(AbstractTestQueryFramework.java:410)
    at io.trino.testing.AbstractTestQueryFramework.assertUpdate(AbstractTestQueryFramework.java:405)
    at io.trino.testing.BaseConnectorTest.lambda$testDataMapping$67(BaseConnectorTest.java:5103)
    at io.trino.testing.BaseConnectorTest.testDataMapping(BaseConnectorTest.java:5110)
    at io.trino.testing.BaseConnectorTest.testDataMappingSmokeTest(BaseConnectorTest.java:5080)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
    at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
    at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
    at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
    Suppressed: java.lang.Exception: SQL: CREATE TABLE test_data_mapping_smoke_datebk4axj6ff2 AS SELECT CAST(row_id AS varchar(50)) row_id, CAST(value AS date) value, CAST(value AS date) another_column FROM (VALUES   ('null value', NULL),   ('sample value', DATE '0001-01-01'),   ('high value', DATE '1582-10-04'))  t(row_id, value)
        at io.trino.testing.DistributedQueryRunner.executeWithQueryId(DistributedQueryRunner.java:497)
        ... 20 more
Caused by: com.google.cloud.bigquery.BigQueryException: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support.
    at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
    at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.insertAll(HttpBigQueryRpc.java:507)
    at com.google.cloud.bigquery.BigQueryImpl.insertAll(BigQueryImpl.java:1103)
    at io.trino.plugin.bigquery.BigQueryClient.insert(BigQueryClient.java:358)
    at io.trino.plugin.bigquery.BigQueryPageSink.appendPage(BigQueryPageSink.java:85)
    at io.trino.operator.TableWriterOperator.addInput(TableWriterOperator.java:255)
    at io.trino.operator.Driver.processInternal(Driver.java:407)
    at io.trino.operator.Driver.lambda$process$8(Driver.java:305)
    at io.trino.operator.Driver.tryWithLock(Driver.java:701)
    at io.trino.operator.Driver.process(Driver.java:297)
    at io.trino.operator.Driver.processForDuration(Driver.java:268)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:888)
    at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
    at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:561)
    at io.trino.$gen.Trino_testversion____20230523_085001_904.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 500 Internal Server Error
POST https://www.googleapis.com/bigquery/v2/projects/sep-bq-cicd/datasets/tpch/tables/tmp_trino_e49302d4_15c55174/insertAll?prettyPrint=false
{
  "code": 500,
  "errors": [
    {
      "domain": "global",
      "message": "An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support.",
      "reason": "internalError"
    }
  ],
  "message": "An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support.",
  "status": "INTERNAL"
}
    at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$3.interceptResponse(AbstractGoogleClientRequest.java:466)
    at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:552)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
    at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.insertAll(HttpBigQueryRpc.java:505)
    ... 16 more
hashhar commented 1 year ago

cc: @wendigo

https://github.com/trinodb/trino/pull/17067 would probably help with this.

findepi commented 9 months ago

https://github.com/trinodb/trino/actions/runs/7874385993/job/21484886371?pr=20661

Error:  io.trino.plugin.bigquery.TestBigQueryTaskFailureRecoveryTest.testExplainAnalyze -- Time elapsed: 75.79 s <<< ERROR!
io.trino.testing.QueryFailedException: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: