Open electrum opened 3 years ago
Another way is to have an UDF that can sleep.
The problem is that it is very database specific, so that would require writing dedicated code for every JDBC connector.
CC: @ssheikin
There is pg_sleep in PostgreSQL.
Per discussion offline with @kokosing @sopel39 @erichwang there is actually a bug that long running JDBC queries are not getting cancelled, if getting next row (io.trino.plugin.jdbc.JdbcRecordCursor#advanceNextPosition
) takes very long time. While this is ultimately connector- (JDBC driver-) -dependent, it is a side effect of the fact we rely in thread interruption, which is known not to be reliable cancellation mechanism in practice. io.trino.plugin.jdbc.JdbcRecordCursor#close
does not get called when io.trino.plugin.jdbc.JdbcRecordCursor#advanceNextPosition
blocks for very long time.
We have the same issue when run query with regexp_like
filter which can't find any row.
Underlying plugin returns rows quickly but ScanFilterAndProjectOperator
can't find any matching row and got stuck when the query is cancelled. Also it consumes CPU and memory pool.
Even 1 such query make the whole cluster stuck.
Here is stack trace of hang thread
"20210319_123048_22212_cbxjr.3.0-1-49204" #49204 prio=5 os_prio=0 tid=0x00007f3fb4013740 nid=0x4636 runnable [0x00007f3e195d2000]
java.lang.Thread.State: RUNNABLE
at io.airlift.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:203)
at io.airlift.joni.Matcher.matchCheck(Matcher.java:304)
at io.airlift.joni.Matcher.searchInterruptible(Matcher.java:469)
at io.airlift.joni.Matcher.search(Matcher.java:318)
at io.prestosql.operator.scalar.JoniRegexpFunctions.regexpLike(JoniRegexpFunctions.java:55)
at java.lang.invoke.LambdaForm$DMH/453671855.invokeStatic_LL_I(LambdaForm$DMH)
at java.lang.invoke.LambdaForm$MH/595542127.linkToTargetMethod(LambdaForm$MH)
at io.prestosql.$gen.CursorProcessor_20210319_123054_20630.filter(Unknown Source)
at io.prestosql.$gen.CursorProcessor_20210319_123054_20630.process(Unknown Source)
at io.prestosql.operator.ScanFilterAndProjectOperator$RecordCursorToPages.process(ScanFilterAndProjectOperator.java:309)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.prestosql.operator.WorkProcessorUtils.access$000(WorkProcessorUtils.java:37)
at io.prestosql.operator.WorkProcessorUtils$YieldingProcess.process(WorkProcessorUtils.java:181)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.prestosql.operator.WorkProcessorUtils$$Lambda$5970/335637031.process(Unknown Source)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277)
at io.prestosql.operator.WorkProcessorUtils$$Lambda$5966/513430371.process(Unknown Source)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.prestosql.operator.WorkProcessorUtils$$Lambda$5970/335637031.process(Unknown Source)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
at io.prestosql.operator.WorkProcessorUtils$$Lambda$5972/2142800982.process(Unknown Source)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:148)
at io.prestosql.operator.Driver.processInternal(Driver.java:379)
at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
at io.prestosql.operator.Driver$$Lambda$6011/973436375.get(Unknown Source)
at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
at io.prestosql.operator.Driver.processFor(Driver.java:276)
at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.prestosql.$gen.Presto_326____20210318_133749_2.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Sample query:
select * from prod.x.query_stats
where executionstarttime >= to_unixtime(timestamp '2021-01-01 00:00:00') * 1000
and executionstarttime < to_unixtime(timestamp '2021-03-20 00:00:00') * 1000
and ( regexp_like(query, '(?i).*rename\s+to\s+prod.a.t_backup.*') OR regexp_like(query, '(?i).*drop\s+table.+prod.a.t_backup.*') )
order by createtime desc
Underlying catalog is parquet based with native presto-parquet reader v346.
@davseitsev you might want to try feature config regex-library=re2j
, which should have more predictable performance in some cases.
Just want to clarify that the issue is that a query stucks on workers when it is cancelled on coordinator and keep consuming resources.
Just want to clarify that the issue is that a query stucks on workers when it is cancelled on coordinator and keep consuming resources.
Make sense. This is because code can be spending lot of time within single projection. This is, however, a different problem than one being addressed here, although I agree that from user perspective they look similar.
@davseitsev Would you like to create a separate Github issue for your case? Notice that your issues is not JDBC connector related at all. So solving your problem requires dedicated separate effort than solving the issue described here.
We should write a test that runs a query that produces many rows (maybe a cross join on a large table) and verify that the query killed on the server afterwards (by using a database specific method of listing the running queries). We can embed a UUID in the SQL query as a comment to identify it.