prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.99k stars 5.36k forks source link

Plan doesn't show the json parse-and-cast optimization #10468

Closed wenleix closed 4 years ago

wenleix commented 6 years ago

We will optimize JSON_PARSE(col) AS ARRAY to use $internal$json_string_to_array_cast (done in https://github.com/prestodb/presto/issues/8286)

However, this is not reelected in the plan:

presto:tiny> explain (type distributed) SELECT CAST(JSON_PARSE(col) AS ARRAY(INTEGER)) FROM (VALUES( 'kk' )) AS t(col);
                                      Query Plan
---------------------------------------------------------------------------------------
 Fragment 0 [SINGLE]
     Output layout: [expr_1]
     Output partitioning: SINGLE []
     Execution Flow: UNGROUPED_EXECUTION
     - Output[_col0] => [expr_1:array(integer)]
             Cost: {rows: 1 (10B), cpu: 20.00, memory: 0.00, network: 0.00}
             _col0 := expr_1
         - Project[] => [expr_1:array(integer)]
                 Cost: {rows: 1 (10B), cpu: 20.00, memory: 0.00, network: 0.00}
                 expr_1 := CAST("json_parse"("field") AS array(integer))
             - LocalExchange[ROUND_ROBIN] () => field:varchar(2)
                     Cost: {rows: 1 (10B), cpu: 10.00, memory: 0.00, network: 0.00}
                 - Values => [field:varchar(2)]
                         Cost: {rows: 1 (10B), cpu: 0.00, memory: 0.00, network: 0.00}
                         ('kk')
(1 row)

The execution shows the optimization is working correctly (instead of throwing exception at json_parse):

presto:tiny> SELECT json_parse(col) FROM (VALUES( 'kk' )) AS t(col);

Query 20180421_065608_00016_b2aef, FAILED, 1 node
http://localhost:8080/query.html?20180421_065608_00016_b2aef
Splits: 5 total, 4 done (80.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 14% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s
Parallelism: 0.0
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20180421_065608_00016_b2aef failed: Cannot convert 'kk' to JSON
com.facebook.presto.spi.PrestoException: Cannot convert 'kk' to JSON
    at com.facebook.presto.operator.scalar.JsonFunctions.jsonParse(JsonFunctions.java:155)
    at com.facebook.presto.$gen.PageProjectionWork_20180421_065608_15.evaluate(Unknown Source)
    at com.facebook.presto.$gen.PageProjectionWork_20180421_065608_15.process(Unknown Source)
    at com.facebook.presto.operator.project.DictionaryAwarePageProjection$DictionaryAwarePageProjectionWork.process(DictionaryAwarePageProjection.java:179)
    at com.facebook.presto.operator.project.PageProcessor$PositionsPageProcessorIterator.processBatch(PageProcessor.java:278)
    at com.facebook.presto.operator.project.PageProcessor$PositionsPageProcessorIterator.computeNext(PageProcessor.java:182)
    at com.facebook.presto.operator.project.PageProcessor$PositionsPageProcessorIterator.computeNext(PageProcessor.java:129)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
    at com.facebook.presto.operator.project.PageProcessorOutput.hasNext(PageProcessorOutput.java:49)
    at com.facebook.presto.operator.project.MergingPageOutput.getOutput(MergingPageOutput.java:110)
    at com.facebook.presto.operator.FilterAndProjectOperator.getOutput(FilterAndProjectOperator.java:104)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:392)
    at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:281)
    at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:687)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:275)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:975)
    at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
    at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:492)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
haozhun commented 6 years ago

The general problem of worker doing additional optimization that EXPLAIN cannot show is a valid issue.

The issue here is that RowExpression optimization happens on workers. As a result, explain cannot show them.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.