Open dengweisysu opened 3 years ago
We need a more reproducible test Also Presto has the mark_distinct operator for count disitncct. See if turning that off makes any difference.
"use_mark_distinct=fasle" make no difference. And single distinct query will be optimized to group-by query. The problem is similar with this issue : https://github.com/prestodb/presto/issues/13015
for the query counting distinct 10 billion data. like below: ` SELECT * FROM (SELECT fdate as a_ds, count(distinct(userid)) as index_0_7366 FROM (SELECT t1.fdate as fdate, t1.userid as userid FROM (SELECT t0.fdate as fdate, t0.userid as userid FROM test.test_table t0 WHERE t0.fdate>=20210401 AND t0.fdate<20210531 ) t1) a GROUP BY a_ds) t_ret order by a_ds desc LIMIT 5001
` running with the same hardware(83 node, 48 core with 96 processor, 256GB mem ).
although presto run faster than impala, but presto waste too much cpu resource than impala. Is the disadvantage of java (presto) compare with C++ ( impala)
one of presto host Cpu Utilization (50%+)
impala cluster cpu Utilization( one line for one machine) (15%+)
I capture thread stack when running query, and get top 10 class (first line of runnable thread) below: class full name ----- occurrence count in thread stack
alluxio.shaded.client.io.netty.channel.epoll.Native-----382 com.facebook.presto.operator. MultiChannelGroupByHash-----59 io.airlift.slice.Slices-----31 sun.nio.ch.EPoll-----20 com.facebook.presto.common.block.AbstractVariableWidthBlock-----13 io.airlift.slice.DynamicSliceOutput-----10 com.facebook.presto.common.type.AbstractLongType-----9 com.facebook.airlift.http.client.jetty.BufferingResponseListener-----7 com.facebook.presto.common.block.VariableWidthBlock-----6 sun.management.ThreadImpl-----5
In Impala, impala use code generation to accelerate, why presto not?