prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.08k stars 5.39k forks source link

Flaky test: AbstractTestNativeTpcdsQueries.testTpcdsQ64 #20271

Open Ali-P opened 1 year ago

Ali-P commented 1 year ago

Link: https://app.circleci.com/pipelines/github/prestodb/presto/4691/workflows/fb21b4cc-d4d8-427c-af08-867def84436a/jobs/11146

[ERROR] Tests run: 103, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 241.988 s <<< FAILURE! - in com.facebook.presto.nativeworker.TestPrestoNativeTpcdsQueriesParquetUsingThrift
[ERROR] com.facebook.presto.nativeworker.TestPrestoNativeTpcdsQueriesParquetUsingThrift.testTpcdsQ64  Time elapsed: 18.436 s  <<< FAILURE!
java.lang.AssertionError: 
Execution of 'actual' query failed
...
Caused by: VeloxRuntimeError:  Operator::isBlocked failed for [operator: MergeExchange, plan node ID: 6184]: Abort results failed: ingress timeout, streamID=2, path /v1/task/20230710_213739_00064_wy99b.1.0.1.0/results/0
...
amitkdutta commented 1 year ago

CC: @majetideepak @xiaoxmeng

mbasmanova commented 1 year ago

CC: @aditi-pandit @frankobe

frankobe commented 1 year ago

I am looking into this now

mshang816 commented 1 year ago

it looks like testTpcdsQ67 is flaky as well https://app.circleci.com/pipelines/github/prestodb/presto/4778/workflows/2f7cea8e-69d5-4895-989f-d4e0edd17dc9/jobs/11576

majetideepak commented 1 year ago

@frankobe any updates on this? Should we disable these flaky tests?

frankobe commented 1 year ago

Disable the test for now in https://github.com/prestodb/presto/pull/20312

I can't reproduce this flakiness locally after 100 continuous runs. The only clue I notice is that Q64 has the longest running time among 99 queries so can has a higher chance of timeout.