sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.09k stars 699 forks source link

Decomposing refactorization #1207

Open wangkuiyi opened 4 years ago

wangkuiyi commented 4 years ago

Let us merge the SQLFlow server and the REPL into a single binary, sqlflow.

This proposed merging is due to a decomposition of the translation process by SQLFlow.

  1. parsing: from SQL program to IR.
  2. generating Couler program: from IR to Couler program.
  3. generating Argo YAML: from Couler program to Argo YAML file.

Some steps of the generated workflow handle SQL statements with SQLFlow extended syntax. Such a step contains multiple steps:

  1. parsing a SQL statement into IR.
  2. generating a Python program from the IR.

All the above bullets are about kind of translation. We could use a single command-line tool to do all these translations.

typhoonzero commented 4 years ago

The name sqlflow may conflict with current pysqlflow command-line client: https://github.com/sql-machine-learning/pysqlflow/blob/develop/setup.py#L73


@tonyyang-svail: I can think of two workarounds:

  1. Go binary: sqlflowcmd. Python client sqlflow.
  2. Go binary: sqlflow. Python client pysqlflow.
typhoonzero commented 4 years ago
sqlflow -codegen [tensorflow|xgboost] -db maxcompute < [a.json|a.sql] > a.py

->

sqlflow -codegen [tensorflow|xgboost] -db maxcompute  -engine [pai_tf | elasticdl | alps | tensorflow | xgboost] < [a.json|a.sql] > a.py
Yancey1989 commented 4 years ago

Converts an IR of only one statement with extended syntax into a Python submitter program.

Maybe we don't need to convert IR into submitter program, instead of a submitter program accepts an IR as an argument: python sqlflow_submitter.tensorflow.train < ir.json. So that we can implement a couler step function like:

couler.sqlflow.run(cmd='echo "SELECT ... TO TRAIN ... | sqlflow -parse | python -m sqlflow_submitter.tensorflow.train "')
weiguoz commented 4 years ago

sqlflow -codegen [tensorflow|xgboost] -db maxcompute < [a.json|a.sql] > a.py

I don't think we need to specify the [tensorflow|xgboost]. Because this information can be derived from a.sql

tonyyang-svail commented 4 years ago

Let us merge the SQLFlow server and the REPL into a single binary, sqlflow.

@wangkuiyi We still need a gRPC server though.