prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.05k stars 5.38k forks source link

Per query/session settings #1062

Closed snarayanank2 closed 9 years ago

snarayanank2 commented 10 years ago

It would be useful to have a per query / per session settings mechanism that can be used to control optimizations and other functionality.

alexliu68 commented 10 years ago

Per query level configuration can be done by create a CONFIG clause and pass any key value pair.

e.g. select * from stocks CONFIG key=value, key=value

The per query settings can be passed to connector to override the per-application settings

vrajat commented 10 years ago

Hello Alex, I have seen other databases take configuration like: select /key=value, ..../ [column list] from .... insert /.../ into etc. One advantage is that you dont have to wrangle with parser to make CONFIG work with all the various clauses in all the statements. (I know Presto doesnt support many right now.)

WRT to storing the configuration once its accepted, do you have ideas or a plan to how to implement it ?

alexliu68 commented 10 years ago

The issue is that key=value is not passed to connector for current implementation(I tested select 'key=value', * from stocks). CONFIG clause can be used as a general purpose to pass additional information to connector, so it's up to connector to use those information. e.g. Limit clause can be used here as a key/value pair passed to connector

snarayanank2 commented 10 years ago

Personally, I'm not a big fan of passing configurations inline in a query. Oracle famously allows inline hints to the optimizer using syntax like:

SELECT /*+ STAR_TRANSFORMATION */  *
  FROM sales s, times t, products p, channels c
  WHERE s.time_id = t.time_id
    AND s.prod_id = p.product_id
    AND s.channel_id = c.channel_id
    AND p.product_status = 'obsolete';

I understand that per query/session settings are necessary to give certain degree of control, but am not in favor of messing with the grammar for that. Set statements aren't in the SQL standard either AFAIK, but seem to be whole lot more common. AFAIK, the following DBs have a set key=value syntax:

Irrespective of how we implement the surface, I'm not completely clear as to what is the right way for these parameters to be passed to the ConnectorMetadata, ConnectorSplitManager and others. In our case, we'd be interested in making split computations configurable and I was thinking of extending the ConnectorSplitManager interface along these lines (the properties would be built from the sessions object):

SplitSource getPartitionSplits(TableHandle th, List<Partitions> parts, Properties prop)

It would be really great if someone from FB team chimed in to the discussion as all of this is fairly invasive and it would make sense to get some feedback early.

electrum commented 10 years ago

I think these might be two separate issues. I believe what @snarayananqubole wants is the ability to control global things like the optimizer in the main Presto code for an entire query, but @alexliu68 wants connector parameters that are scoped to a given table access. For the latter, I say scoped because you might have a query that accesses multiple tables from

snarayanank2 commented 10 years ago

That's a fair point, @electrum. Though it might be possible to identify right properties to pass using namespacing? Example, datasource.table.key=value ?

Any thoughts on how to extend the interfaces to pass these properties?

alexliu68 commented 10 years ago

The following link is a simple implementation of per query configuration settings and limit clause push down

https://github.com/alexliu68/presto/commit/65f7add73642df4fd509a719b878f04c41a31746

e.g. select /! cassandra.presto.test.limit_clause=2 !/ * from test;

cberner commented 9 years ago

We added session parameters a while back