sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.07k stars 697 forks source link

Refactory Discussion 2: the python modules/packages and the go pkgs after refactory #2287

Open shendiaomo opened 4 years ago

shendiaomo commented 4 years ago

Background

In #2168 we've figured out the direction of migrating SQLFlow core from golang to python. In #2214 and #2235 , we' ve proven up that attribute checking can be implemented in python in a precise and concise way. In #2225 , we've made it clear that how to reuse existing golang DB code (especially goalisa) to avoid unnecessary rewriting.

Plan

Python Version

At the moment, we should be compatible with both python2 and python3 in most of the modules.

Style

We follow the Google Style Guide

Preparation

We have to define a command-line flag --python_refactory and use that flag to make sure the existing code is still working before we finish the refactorying.

Modules kept in golang

Modules to be moved into python

Supplementary notes of the new architecture on several scenarios

The PAI platform

  1. sqlflow/platform/pai.py copies the whole package and an entry.py to PAI using odpscmd or alisa
  2. entry.py calls sqlflow.execute as what the sqlflow/pkg/execute does.

The Workflow

Yancey1989 commented 4 years ago

After talked with @shendiaomo, we can refactor the workflow codebase:

  1. Move couler codegen to Python API so that we only need ONE code generator.

    1. keep the Run and Fetch gRPC interface.
    2. Run generate Argo workflow and return a workflowID by the following SQLFlow Python API, here we don't need the couler codegen.
       sqlflow.execute(engine="couler", ir=[{parsed result of sql1}, {parsed result of sql2}...])
    3. sqlflow.execute(engine="couler", ir=...) would call couler/fluid API to generate a workflow YAML, and submit it to Kubernetes.
  2. Don't need step go binary file, each step can execute a Python API call like:

    sqlflow.execute(engine="pai", ir={parsed result of sql})