sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.07k stars 699 forks source link

How to inject data transformation code into model zoo Docker image #2126

Open wangkuiyi opened 4 years ago

wangkuiyi commented 4 years ago

The Challenge

Reminded by @brightcoder01 , the SQLFlow server translates a SQL program into a workflow of steps, where each step runs as a Docker container.

Some of these steps run ML training, which requires model definition (in Python source code), and the data transformation Python code generated by SQLFlow server from the COLUMN clause in the SQL program.

The question here is -- how to combine the generated data transformation code and the model definition code in the step container?

A Solution

From @typhoonzero , we require that the base image of the step containers must be sqflow/sqlflow, which includes the cmd/step binary, which takes a SQL statement with COLUMN clause and translates the data transformation code. Thus the translator and the model definition code are all in the same container.

wangkuiyi commented 4 years ago

I have a question:

Suppose that we are going to have SQLFlow Python API like

sqlflow.train(, columns={"column": lambda(x): sqlflow.vocabularize(x)}, ...)

We want to make sure that this API works with the solution. It seems not a problem, as reminded by @Yancey1989 , the above lambda is not indeed the generated data transformation code, but part of the high-level API and is to be translated.