uccross / skyhookdm-ceph-cls

Skyhook Data Management: Storage and management of tabular data in Ceph.
https://www.skyhookdm.com
GNU Lesser General Public License v2.1
13 stars 9 forks source link

Python front end for SQL query #16

Open jlefevre opened 4 years ago

jlefevre commented 4 years ago

This will create a python front end that will generate skyhook query and pushdown project for now (select later) via a SQL statement in a python shell such as: SELECT a,b, from T;

And queries of the form: SELECT a,b, from T WHERE a < x; SELECT sum(a) from T;

Can report unsupported syntax at this time. Although skyhook supports those ops, we defer until the basic project works reliably with error checking via this python SQL interface.

This query SELECT a,b, from T; should pushdown the column projection for cols a,b into storage via the correct skyhook run-query flags. There will also be metadata DDL such as table create statement to import before querying. For now assume a default test table LINEITEM schema. Underlying data format should not be specified at query time as that is always invisible to the user. Output is reported within the python shell or use something such as an '-o filename' or '\output filename' to send results to file.

jlefevre commented 4 years ago

Note, the flags to indicate PROJECT cols for the above query SELECT a,b FROM T; are --project "a,b"

Mrhea commented 4 years ago

Support syntax at this time includes queries of the form SELECT a,b FROM T;. WHERE clauses work, but will be added after simplifying query parsing to eliminate repeated code. Can report that multiple semicolon separated queries can be inputted.

Default table LINEITEM is assumed for now, but it does support choosing a different table.

Will support the use of input files that have semi-colon separated SQL queries.