Open wangkuiyi opened 4 years ago
I have a question:
Suppose that we are going to have SQLFlow Python API like
sqlflow.train(, columns={"column": lambda(x): sqlflow.vocabularize(x)}, ...)
We want to make sure that this API works with the solution. It seems not a problem, as reminded by @Yancey1989 , the above lambda
is not indeed the generated data transformation code, but part of the high-level API and is to be translated.
The Challenge
Reminded by @brightcoder01 , the SQLFlow server translates a SQL program into a workflow of steps, where each step runs as a Docker container.
Some of these steps run ML training, which requires model definition (in Python source code), and the data transformation Python code generated by SQLFlow server from the COLUMN clause in the SQL program.
The question here is -- how to combine the generated data transformation code and the model definition code in the step container?
A Solution
From @typhoonzero , we require that the base image of the step containers must be
sqflow/sqlflow
, which includes thecmd/step
binary, which takes a SQL statement with COLUMN clause and translates the data transformation code. Thus the translator and the model definition code are all in the same container.