pip install mindsdb_sql
Parser.
Planner
Render
from mindsdb_sql import parse_sql
query = parse_sql('select b from aaa where c=1', dialect='mindsdb')
# result is abstract syntax tree (AST)
query
# string representation of AST
query.to_tree()
# representation of tree as sql string. it can not exactly match with original sql
query.to_string()
mysql
sqlite
mindsdb
For parsing is used SLY library.
Parsing consists of 2 stages, (separate module for every dialect):
SLY does not support inheritance, therefore every dialect is described completely, without extension one from another.
For better user experience parsing error contains useful information about problem location and possible solution to solve it.
How suggestion works: It uses next possible tokens defined by syntax rules. If this is the end of the query: just shows these tokens. Else:
Example:
Initialize planner
from mindsdb_sql.planner import query_planner
# all parameters are optional
planner = query_planner.QueryPlanner(
ast_query, # query as AST-tree
integrations=['mysql'], # list of available integrations
predictor_namespace='mindsdb', # name of namespace to lookup for predictors
default_namespace='mindsdb', # if namespace is not set in query default namespace will be used
predictor_metadata={ # information about predictors
'tp3': { # name of predictor
'timeseries': True, # is timeseries predictor
'order_by_column': 'pickup_hour', # timeseries column
'group_by_columns': ['day', 'type'], # columns for partition (only for timeseries)
'window': 10 # windows size (only for timeseries)
}
}
)
Detailed description of timeseries predictor: [https://docs.mindsdb.com/sql/create/predictor/]
Plan of prepared statement
Planner can be used in case of query with parameters: query is not complete and can't be executed. But it is possible to get list of columns and parameters from query.
for step in planner.prepare_steps(ast_query):
data = do_execute_step(step)
step.set_result(data)
statement_info = planner.get_statement_info()
# list of columns
print(statement_info['columns'])
# list of parameters
print(statement_info['parameters'])
At the moment this functionality is used only in COM_STMT_PREPARE command of mysql binary protocol.
Plan of execution
# if prepare_steps was executed we need pass params.
# otherwise, params=None
for step in planner.execute_steps(params):
data = do_execute_step(step)
step.set_result(data)
Query result data will be on output of the last step.
Alternative way of execution
At the moment execution plan doesn't dependent from results of previous steps. But this behavior can be changed in the future.
With the current behavior that it is possible to get plan of query as list:
from mindsdb_sql.planner import plan_query
plan = plan_query(
ast_query,
integrations=['mysql'],
predictor_namespace='mindsdb',
default_namespace='mindsdb',
predictor_metadata={
'tp3': {
'timeseries': False,
}
}
)
# list of steps
print(plan.steps)
Planner is analysing AST-query and return sequence of steps that is needed to execute to perform query.
Steps are defined in planner/steps.py. Steps can reference to future result of previous step (using class Result in planner/step_results.py)
Query planner consists from 2 different planner:
For prepare statement is class PreparedStatementPlanner in query_prepare.py
For execution is class QueryPlanner in query_panner.py The most complex part of planner is planning of join table with timeseries predictor. Logic briefly:
Useful functions
It can be used to analyse composition of AST-tree. An example:
query_predictors = []
def find_predictors(node, is_table, **kwargs):
if is_table and isinstance(node, ast.Identifier):
if is_predictor(node):
query_predictors.append(node)
utils.query_traversal(ast_query, find_predictors)
Renderer is using to convert AST-query to sql string using different sql dialects.
from mindsdb_sql.render.sqlalchemy_render import SqlalchemyRender
renderer = SqlalchemyRender('mysql') # select dialect
sql = renderer.get_string(ast_query, with_failback=True)
If with_failback==True: in case if sqlalchemy unable to render query string will be returned from sql representation of AST-tree (with method to_string)
Only one renderer is available at the moment: SqlalchemyRender.
Supported dialects at the moment: mysql, postgresql, sqlite, mssql, oracle
Notes:
It runs all tests for components
env PYTHONPATH=./ pytest