prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

[Native] Should we have a separate velox.properties file for Prestissimo ? #20406

Open aditi-pandit opened 1 year ago

aditi-pandit commented 1 year ago

Recently, an additional configuration file called velox.properties was added in Prestissimo that could be used to initialize the Velox Query config defaults.

Is this extra file needed ?

Could we add velox properties in config.properties with velox prefix instead ?

There are multiple properties like "spill_enabled", "aggregation_spill_enabled";"join_spill_enabled"; "order_by_spill_enabled"; "aggregation_spill_memory_threshold"; "join_spill_memory_threshold" that carry over from java that are now available from both configs. The source of truth is confusing then.

@amitkdutta @majetideepak @tdcmeehan @spershin @mbasmanova @xiaoxmeng

majetideepak commented 1 year ago

@aditi-pandit Having velox.properties is more explicit in my opinion. Do you see any downside?

tdcmeehan commented 1 year ago

Are operators of Presto expected to change this config value? What does mutable-config=true mean?

If operators are not expected to change this value, it could be fine. But for any configuration where operators are expected to make changes to it, it's preferable to use the same approach as Java, and use a unified config (and avoid specific references to underlying libraries).

aditi-pandit commented 1 year ago

@aditi-pandit Having velox.properties is more explicit in my opinion. Do you see any downside?

The Velox queryConfig has many properties for specific operator behavior like "spill_enabled", "aggregation_spill_enabled";"join_spill_enabled"; "order_by_spill_enabled"; "aggregation_spill_memory_threshold"; "join_spill_memory_threshold"; These are applicable for Presto java also.

The source of truth for such properties has become confusing now.

In Presto java these would make it from config.properties -> SystemSessionProperties class used during task creation.

Now velox.properties will initialize into BaseVeloxQueryConfig at PrestoServer initialization. But then the value in config.properties makes it to SystemSessionProperties that override the VeloxQueryConfig at task creation ?

@spershin, @amitkdutta : Is there some misunderstanding here ?

aditi-pandit commented 1 year ago

Reference

Are operators of Presto expected to change this config value? What does mutable-config=true mean?

Since https://github.com/prestodb/presto/pull/19700 v1/operation end point can be used to change Velox Query config on the fly. mutable-config is used to allow that change.

If operators are not expected to change this value, it could be fine. But for any configuration where operators are expected to make changes to it, it's preferable to use the same approach as Java, and use a unified config (and avoid specific references to underlying libraries).

@tdcmeehan : Its confusing to me because there are existing Presto configuration "like spill_enabled, aggregation_spill_enabled, etc" also applicable for Velox Query config. With the new velox.properties file the source of truth for the property could be confusing.

mbasmanova commented 1 year ago

Folks, where are we with this discussion. It seems strange to introduce velox.properties file. Why not include all worker configs in a single file (node.properties) and translate these to Velox Config as needed at runtime?

CC: @xiaoxmeng @amitkdutta @spershin @pranjalssh @kewang1024

amitkdutta commented 1 year ago

@mbasmanova I assume you meant config.properties. Having everything in the same file is definitely helpful for deployment. In such cases, we might need to add a velox prefix to separate velox specific configs, currently the file itself does the distinction.

CC: @spershin @xiaoxmeng

xiaoxmeng commented 1 year ago

@amitkdutta having a single config file with velox prefix (convert it at Prestissimo layer) sounds a good approach to me. We plan to do similar things for session property for native support @mbasmanova

mbasmanova commented 1 year ago

It would be nice to also add logic to Prestissimo worker to fail fast if configuration file contains unsupported properties. Similarly, it would be nice to add logic to fail if received unsupported session properties.

aditi-pandit commented 1 year ago

It would be nice to also add logic to Prestissimo worker to fail fast if configuration file contains unsupported properties. Similarly, it would be nice to add logic to fail if received unsupported session properties.

@mbasmanova , @amitkdutta , @xiaoxmeng : Could these properties be surfaced to the co-ordinator in any way ? The co-ordinator should not be sending such properties to Prestissimo workers also. If we validate the properties at the co-ordinator it would give us a fast failure path.