prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.05k stars 5.38k forks source link

[native] Disable hive.partial_aggregation_pushdown_enabled #22047

Closed mbasmanova closed 8 months ago

mbasmanova commented 8 months ago

HiveConnector in Velox doesn't support aggregation pushdown, which leads to errors like

VeloxUserError:  Unsupported Hive column type: "AGGREGATED".

This functionality is not needed because in Velox aggregations support pushdown via LazyVectors. This is a more generic functions which allows to avoid coupling aggregations with TableScan and coupling execution with the optimizer.

We should disable hive.partial_aggregation_pushdown_enabled configuration property and corresponding partial_aggregation_pushdown_enabled session property for Prestissimo.

CC: @amitkdutta @spershin @tdcmeehan @aditi-pandit @majetideepak

spershin commented 8 months ago

@mbasmanova

Having different properties in Native and Presto might lead to maintenance chaos. Would it be possible to simply ignore this column type and treat it as a normal one instead? Or does plan change in a way that this would not work?

spershin commented 8 months ago

Folks, I'm starting to think that we should do this on the coordinator. For some session properties we can check if we are native cluster and change behavior (like ignore these propeties). We already have session property indicating 'native':

native-execution-enabled=true

Integrating it into the other properties getters would be easy.

mbasmanova commented 8 months ago

Would it be possible to simply ignore this column type and treat it as a normal one instead? Or does plan change in a way that this would not work?

@spershin No, this changes the plan. Velox cannot accept such a plan.

tdcmeehan commented 8 months ago

Folks, I'm starting to think that we should do this on the coordinator. For some session properties we can check if we are native cluster and change behavior (like ignore these propeties). We already have session property indicating 'native':

native-execution-enabled=true

Integrating it into the other properties getters would be easy.

@spershin along those lines, please see this RFC and associated doc for thoughts on how to go about this. We do have a session property already, but in the end I think we need to more cleanly separate how the coordinator interacts with the two engines, which in this model is modeled as a plugin.

mbasmanova commented 8 months ago

@tdcmeehan I'd like to make a change to skip aggregation pushdown for native workers. I thought that I can use the existing session property SystemSessionProperties#isNativeExecutionEnabled, but I realized that it is not accessible from within the HiveConnector. Did I miss something and it is actually accessible? If not, would it make sense to expose it via com/facebook/presto/spi/ConnectorSession.java or some other means?

tdcmeehan commented 8 months ago

@mbasmanova I don't think system session properties are visible to connectors. Perhaps, for now, add a new system property in the Hive connector?

mbasmanova commented 8 months ago

I don't think system session properties are visible to connectors.

I see.

Perhaps, for now, add a new system property in the Hive connector?

Having 2 session properties that much be synchronized makes maintenance harder. I wonder if it would be better to expose this flag via ConnectorSession.isNativeExecutionEnabled.

tdcmeehan commented 8 months ago

@mbasmanova does it need to be a session property? This seems to be invariant across the uptime of the cluster, and as such it seems that it should just remain constant. Consider if ConnectorSession#isNativeExecutionEnabled would ever return more than just one value per cluster?

mbasmanova commented 8 months ago

does it need to be a session property? This seems to be invariant across the uptime of the cluster, and as such it seems that it should just remain constant.

@tdcmeehan This is a good point. I believe that initially this property was introduced for the benefit of Presto-on-Spark: #18409 . It might have been an oversight to make it a session property.

Should we create an issue to change it to config-only property?

tdcmeehan commented 8 months ago

@mbasmanova sure, I created #22153

mbasmanova commented 8 months ago

@tdcmeehan Thank you, Tim.

Now, let's assume https://github.com/prestodb/presto/issues/22153 is done. How can we get access to this config from the Hive Connector? Do you have any suggestion?

tdcmeehan commented 8 months ago

One suggestion: After having a static configuration property, this property can then be injected without having access to the session. To me, this seems very similar to NodeVersion, which is just a singleton read-only class that is supplied some very basic data from the ConnectorContext. ConnectorContext is used to give the Connector some functionality or information from the engine. So it seems reasonable to me to add this to ConnectorContext, and either set it directly on NodeVersion as metadata, or create a new holder class, that can then be wired as necessary to wherever it is used. If we chose to add it to NodeVersion, it would end up in certain places in a serialized form, e.g. as metadata on the creator of Parquet files, so we may want to make it optional.

mbasmanova commented 8 months ago

@tdcmeehan Tim, thank you for the suggestion. This is very helpful. Appreciate a pointer to NodeVersion.

I understand now that one option could be to add FeaturesConfig to ConnectorContext. Then inject FeaturesConfig into HivePlanOptimizerProvider, get access to isNativeExecutionEnabled() and use it to skip HivePartialAggregationPushdown.

What do you think? Any objections to me making a PR that does that?

tdcmeehan commented 8 months ago

The overall approach is sound, I just think it may make sense to have a more minimal config object that is read-only (as we don't know if connector authors may do something silly like try to write a value in FeaturesConfig) and just exposes things that we know connectors might need, and also allows us to potentially add other configs (other than from FeaturesConfig). It could be something simple like ConnectorSystemProperties, and it just exposes the value of isNativeExecutionEnabled for now.

mbasmanova commented 8 months ago

It could be something simple like ConnectorSystemProperties, and it just exposes the value of isNativeExecutionEnabled for now.

@tdcmeehan Sounds good to me. Thanks. I also realized that FeaturesConfig is in presto-main and we cannot have dependency on presto-main.