sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.05k stars 697 forks source link

Refactor sql package structure #1585

Open typhoonzero opened 4 years ago

typhoonzero commented 4 years ago

c.f. https://github.com/sql-machine-learning/sqlflow/pull/1553#issuecomment-569858480

Currently, sql package contains almost all core code for parsing, generating python code and executing. We need to put those features in a separated package structure for better code understanding:

  1. parser package is already moved under pkg folder
  2. feature_derivation package is already moved under pkg folder
  3. tools like pipe, verifier are already moved under pkg folder
  4. TODO: move ir to pkg folder
  5. TODO: move testdata to pkg folder

We currently have two job execution mode: workflow mode and run in local mode.

We may need to use a command step instead of repl to be more meaningful. For that w'll have

  1. cmd/step calls pkg/step to run a step or, in the future generate a step Python code
  2. pkg/step contains:
    1. run current step and get output (will only be used by repl if we generate python code for each step but not using the command step to run a single SQL statement)
    2. pkg/step/codegen generate step python code
  3. pkg/workflow contains:
    1. submit workflow and monitor the status
    2. pkg/workflow/codegen generate Couler/Fluid python code
    3. pkg/workflow/argo submit, get status, get logs for argo
    4. pkg/workflow/tekton submit, get status, get logs for tekton
wangkuiyi commented 4 years ago

Related https://github.com/sql-machine-learning/sqlflow/issues/1434

typhoonzero commented 4 years ago

Related https://github.com/sql-machine-learning/sqlflow/issues/1583

Yancey1989 commented 4 years ago

It seems that we also need a pkg/submitter package:

`-pkg/submiter
    |-python.go    # cmd: python xxx.py
    |-pai.go       # cmd: pai -Djobname=sqlflow_job ...
    |-alisa.go     # goalisa: alisa.createTask('pai -Djobname=sqlflow_job ...')
    |- TODO: alps/elasticdl ....
typhoonzero commented 4 years ago

@Yancey1989 I recommend put all current "submitter" to pkg/step and rename the interface to Executor, so we can call the executors like step.Executor.ExecuteTrain(...) etc. It's more meaningful.

typhoonzero commented 4 years ago

Related: https://github.com/sql-machine-learning/sqlflow/issues/1434