Open shendiaomo opened 4 years ago
After talked with @shendiaomo, we can refactor the workflow codebase:
Move couler codegen to Python API so that we only need ONE code generator.
Run
and Fetch
gRPC interface.Run
generate Argo workflow and return a workflowID by the following SQLFlow Python API, here we don't need the couler codegen.
sqlflow.execute(engine="couler", ir=[{parsed result of sql1}, {parsed result of sql2}...])
sqlflow.execute(engine="couler", ir=...)
would call couler/fluid API to generate a workflow YAML, and submit it to Kubernetes.Don't need step
go binary file, each step can execute a Python API call like:
sqlflow.execute(engine="pai", ir={parsed result of sql})
Background
In #2168 we've figured out the direction of migrating SQLFlow core from golang to python. In #2214 and #2235 , we' ve proven up that attribute checking can be implemented in python in a precise and concise way. In #2225 , we've made it clear that how to reuse existing golang DB code (especially
goalisa
) to avoid unnecessary rewriting.Plan
Python Version
At the moment, we should be compatible with both python2 and python3 in most of the modules.
Style
We follow the Google Style Guide
Preparation
We have to define a command-line flag
--python_refactory
and use that flag to make sure the existing code is still working before we finish the refactorying.or even
if the change is big enough.
pkg
s,func
s andstruct
s by removing the suffixForPython
.Modules kept in golang
godriver
s andpkg/database
into a python module as described in #2225SQLStatement
interface such asTrainStmt
and so on have to be redefined to be concise and python-compatible, there're several ways to do thisSQLStatementForPython
struct inpkg/ir
and wrap it into a python module usingpygo
orcgo
, use json to serialize in go and deserialize in pythonfeature derivation
-relative code because thefeature derivation
module will be implemented into pythonModules to be moved into python
Feature Derivation
the wrapper of godriver
scontracts
features.py
andcolumns
package), calling thegodriver
wrapper to get database table field metadataModify the
ir_generator.go
to forward the column functions in a SQL statement to python function call For an over simplified example:will map to the python code
@brightcoder01 @typhoonzero Please review this design.
Codegen
SQLStatementForPython
struct)func
likecodegen
package after refactoryAttibute checker and diagnostics
feature derivation
,Python API
contracts.py
in #2235the visitor pattern and submitters
Other go packages
Remove
sqlfs
,model
andverifier
because they'll have their python counterparts (probably as functions or classes in a module)Remove
pkg/sql/codegen
as described aboveMove
database
into a seperate repo to generate the python moduleMove
tablewriter
intostep
Rename
pkg/sql
topkg/executor
because theexecutor.go
would spawn the python process to execute the statement (notrunner
because it may be mistaken as the implementation ofTO RUN
)Move
pipe
andlog
to a new directory namesutils
Supposed go packages
New python modules that have to be implementet from scratch
Python API
Priority
Let's sum up.
godriver
feature derivation
contracts
anddiagnostics
feature derivation
andsqlflow_submitter
features
andcolumns
)contracts
,diagnostics
andgodriver
SQLStatement
and the modification ofir.go
andir_generator.go
sqlflow
packageplatforms
andsqlflow.py
feature derivation
pkg/sql/codegen
and definingpkg/executor
sqlflow_submitter
contracts
anddiagnostics
ir.go
andir_generator.go
feature derivation
needsRemoving codegen and defining pkg/executor
platform
andsqlflow.py
Removing submitter/executors
platform
andsqlflow.py
Supplementary notes of the new architecture on several scenarios
The PAI platform
sqlflow/platform/pai.py
copies the whole package and anentry.py
to PAI usingodpscmd
oralisa
entry.py
callssqlflow.execute
as what thesqlflow/pkg/execute
does.The Workflow
sqlflow_server
generates the workflow as the original architecture, thesqlflow
python package is used by thestep
binary.