Closed snarayanank2 closed 9 years ago
Per query level configuration can be done by create a CONFIG clause and pass any key value pair.
e.g. select * from stocks CONFIG key=value, key=value
The per query settings can be passed to connector to override the per-application settings
Hello Alex, I have seen other databases take configuration like: select /key=value, ..../ [column list] from .... insert /.../ into etc. One advantage is that you dont have to wrangle with parser to make CONFIG work with all the various clauses in all the statements. (I know Presto doesnt support many right now.)
WRT to storing the configuration once its accepted, do you have ideas or a plan to how to implement it ?
The issue is that key=value is not passed to connector for current implementation(I tested select 'key=value', * from stocks). CONFIG clause can be used as a general purpose to pass additional information to connector, so it's up to connector to use those information. e.g. Limit clause can be used here as a key/value pair passed to connector
Personally, I'm not a big fan of passing configurations inline in a query. Oracle famously allows inline hints to the optimizer using syntax like:
SELECT /*+ STAR_TRANSFORMATION */ *
FROM sales s, times t, products p, channels c
WHERE s.time_id = t.time_id
AND s.prod_id = p.product_id
AND s.channel_id = c.channel_id
AND p.product_status = 'obsolete';
I understand that per query/session settings are necessary to give certain degree of control, but am not in favor of messing with the grammar for that. Set statements aren't in the SQL standard either AFAIK, but seem to be whole lot more common. AFAIK, the following DBs have a set key=value syntax:
Irrespective of how we implement the surface, I'm not completely clear as to what is the right way for these parameters to be passed to the ConnectorMetadata, ConnectorSplitManager and others. In our case, we'd be interested in making split computations configurable and I was thinking of extending the ConnectorSplitManager interface along these lines (the properties would be built from the sessions object):
SplitSource getPartitionSplits(TableHandle th, List<Partitions> parts, Properties prop)
It would be really great if someone from FB team chimed in to the discussion as all of this is fairly invasive and it would make sense to get some feedback early.
I think these might be two separate issues. I believe what @snarayananqubole wants is the ability to control global things like the optimizer in the main Presto code for an entire query, but @alexliu68 wants connector parameters that are scoped to a given table access. For the latter, I say scoped because you might have a query that accesses multiple tables from
That's a fair point, @electrum. Though it might be possible to identify right properties to pass using namespacing? Example, datasource.table.key=value ?
Any thoughts on how to extend the interfaces to pass these properties?
The following link is a simple implementation of per query configuration settings and limit clause push down
https://github.com/alexliu68/presto/commit/65f7add73642df4fd509a719b878f04c41a31746
e.g. select /! cassandra.presto.test.limit_clause=2 !/ * from test;
We added session parameters a while back
It would be useful to have a per query / per session settings mechanism that can be used to control optimizations and other functionality.