Open typhoonzero opened 4 years ago
I think it is great that SQLFlow supports stored procedures. But where would we store a "stored procedure"? Do we have to design an access control mechanism for it?
Updated proposal one, and we decided to implement proposal one.
It is too challenging for SQLFlow to be able to encapsulate a SQL program into a SQL procedure. This would make the SQLFlow parser tremendously complex. We cannot afford the engineering cost.
It is possible to encapsulate a complex process in a Python function and enable SQLFlow to call the Python function.
Consider the following Python function
def a_python_func(iter, param1, param2):
for row in iter:
print(row)
We hope that by packing it into a Docker image cxwangyi/procedures
, we can call it from SQLFlow with the following new SQL syntax extension:
SELECT * FROM tbl
TO RUN cxwangyi/procedure:a_python_fuc
WITH param1=100, param2="hello";
The first parameter iter
of a_python_func
iterates rows returned by the query SELECT * FROM tbl
. The rest parameters get their values from the WITH clause.
Background
Consider below SQL program:
There are cases that we need to invoke this SQL program with similar but different
origin_data
andpredict_result
, or invoke this SQL snippet multiple times in a SQL program.Proposal
To use the python function:
SELECT * FROM my_raw_table TO RUN sqlflow_run WITH arg1=3, arg2=5
CREATE PROCEDURE
orCREATE FUNCTION
to define the SQL snnipet as a callable procedure or function, like: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html${CALL TEMPLATE MyPipeLine arg1 arg2 ...}
, expand the template call to a SQL snnipt when executing the current SQL program. And, the template can be saved when editing the SQL program on Dataworks.