Closed mbasmanova closed 8 months ago
@mbasmanova
Having different properties in Native and Presto might lead to maintenance chaos. Would it be possible to simply ignore this column type and treat it as a normal one instead? Or does plan change in a way that this would not work?
Folks, I'm starting to think that we should do this on the coordinator. For some session properties we can check if we are native cluster and change behavior (like ignore these propeties). We already have session property indicating 'native':
native-execution-enabled=true
Integrating it into the other properties getters would be easy.
Would it be possible to simply ignore this column type and treat it as a normal one instead? Or does plan change in a way that this would not work?
@spershin No, this changes the plan. Velox cannot accept such a plan.
Folks, I'm starting to think that we should do this on the coordinator. For some session properties we can check if we are native cluster and change behavior (like ignore these propeties). We already have session property indicating 'native':
native-execution-enabled=true
Integrating it into the other properties getters would be easy.
@spershin along those lines, please see this RFC and associated doc for thoughts on how to go about this. We do have a session property already, but in the end I think we need to more cleanly separate how the coordinator interacts with the two engines, which in this model is modeled as a plugin.
@tdcmeehan I'd like to make a change to skip aggregation pushdown for native workers. I thought that I can use the existing session property SystemSessionProperties#isNativeExecutionEnabled, but I realized that it is not accessible from within the HiveConnector. Did I miss something and it is actually accessible? If not, would it make sense to expose it via com/facebook/presto/spi/ConnectorSession.java or some other means?
@mbasmanova I don't think system session properties are visible to connectors. Perhaps, for now, add a new system property in the Hive connector?
I don't think system session properties are visible to connectors.
I see.
Perhaps, for now, add a new system property in the Hive connector?
Having 2 session properties that much be synchronized makes maintenance harder. I wonder if it would be better to expose this flag via ConnectorSession.isNativeExecutionEnabled.
@mbasmanova does it need to be a session property? This seems to be invariant across the uptime of the cluster, and as such it seems that it should just remain constant. Consider if ConnectorSession#isNativeExecutionEnabled
would ever return more than just one value per cluster?
does it need to be a session property? This seems to be invariant across the uptime of the cluster, and as such it seems that it should just remain constant.
@tdcmeehan This is a good point. I believe that initially this property was introduced for the benefit of Presto-on-Spark: #18409 . It might have been an oversight to make it a session property.
Should we create an issue to change it to config-only property?
@mbasmanova sure, I created #22153
@tdcmeehan Thank you, Tim.
Now, let's assume https://github.com/prestodb/presto/issues/22153 is done. How can we get access to this config from the Hive Connector? Do you have any suggestion?
One suggestion: After having a static configuration property, this property can then be injected without having access to the session. To me, this seems very similar to NodeVersion
, which is just a singleton read-only class that is supplied some very basic data from the ConnectorContext
. ConnectorContext
is used to give the Connector some functionality or information from the engine. So it seems reasonable to me to add this to ConnectorContext
, and either set it directly on NodeVersion
as metadata, or create a new holder class, that can then be wired as necessary to wherever it is used. If we chose to add it to NodeVersion
, it would end up in certain places in a serialized form, e.g. as metadata on the creator of Parquet files, so we may want to make it optional.
@tdcmeehan Tim, thank you for the suggestion. This is very helpful. Appreciate a pointer to NodeVersion.
I understand now that one option could be to add FeaturesConfig to ConnectorContext. Then inject FeaturesConfig into HivePlanOptimizerProvider, get access to isNativeExecutionEnabled() and use it to skip HivePartialAggregationPushdown.
What do you think? Any objections to me making a PR that does that?
The overall approach is sound, I just think it may make sense to have a more minimal config object that is read-only (as we don't know if connector authors may do something silly like try to write a value in FeaturesConfig
) and just exposes things that we know connectors might need, and also allows us to potentially add other configs (other than from FeaturesConfig
). It could be something simple like ConnectorSystemProperties
, and it just exposes the value of isNativeExecutionEnabled
for now.
It could be something simple like ConnectorSystemProperties, and it just exposes the value of isNativeExecutionEnabled for now.
@tdcmeehan Sounds good to me. Thanks. I also realized that FeaturesConfig is in presto-main and we cannot have dependency on presto-main.
HiveConnector in Velox doesn't support aggregation pushdown, which leads to errors like
This functionality is not needed because in Velox aggregations support pushdown via LazyVectors. This is a more generic functions which allows to avoid coupling aggregations with TableScan and coupling execution with the optimizer.
We should disable hive.partial_aggregation_pushdown_enabled configuration property and corresponding partial_aggregation_pushdown_enabled session property for Prestissimo.
CC: @amitkdutta @spershin @tdcmeehan @aditi-pandit @majetideepak