Open brightcoder01 opened 4 years ago
We should clean up the temp at the end of each step, so we need three steps at leas for each job:
Another question, how do we confirm the clean up step
would be scheduled if the job failed?
I am not sure if we could formalize the execution of the TO TRAIN
statement into a specific number n of steps.
However, I have an idea that might help us figure this out: let us draw a 2-dimensional table -- one dimension includes various deployments (on Google Cloud, Amazon AWS, Alibaba internal, Travis CI, etc), and the other one includes the proposed steps. In each cell, we need to figure out the Docker image to be executed by a step in the given deployment.
We might add more dimensional like various DBMSes, AI platforms, model definition APIs (TensorFlow/Keras, PyTorch, XGBoost), data readers (MaxCompute multi-tunnel reader, local filesystem reader, etc).
We also need to take the model zoo design into consideration because it seems that at least one of the steps to execute the TO TRAIN
statement needs to train a model defined in a zoo.
The user can choose to use different base images to adapt his model definition, e.g. tf-2.0, tf-1.15, pytorch-1.0 etc. SQLFlow provides a list of base images for users to choose, SQLFlow guarantees that each base image should work on every execution engine including locally run, ElasticDL, PAI etc.
For the first step of all the statements SELECT Data from source table into Temp table
, the behaviors are different for various DBMS in the current status:
*Question: Can we make the behavior unified? Write the result of `SELECT FROM` into a temporary storage(table) for all the DBMS?**
For the first step of all the statements
SELECT Data from source table into Temp table
, the behaviors are different for various DBMS in the current status:
- MySQL/Hive: We don't create the temporary table.
- MaxCompute: We create the temporary table.
Addition: It's no matter with the DBMS, but up to the AI engine.
- Should we differentiate the step image by Platforms?
- Should model / function images from model zoo be platform agnostic?
For these two questions above, the discussion notes:
1) Yes. Different step images for various deployment platforms (one image for one platform): Vanilla Kubernetes, Alibaba internal, Google Cloud and so on.
2) No. Different model zoo service for various deployment platforms. For the same keras model definition such as DNNClassifier
, we will build different model images with different base images for various deployment platforms. For example, the base image for Vanilla Kubernetes contains elasticdl, and the base image for Alibaba internal contains goalisa.
We should clean up the temp at the end of each step, so we need three steps at leas for each job:
- create a temp table
- submit a job
- clean up the temp table
Another question, how do we confirm the
clean up step
would be scheduled if the job failed?
We can set the lifecycle for the temp table.
A SQLFlow statement can be generated into one or more couler steps. We are listing what each step does and what image it relies on here:
TO TRAIN:
DESCRIBE
for table schema, Aggregation SQLTO PREDICT:
TO EVALUATE:
TO RUN:
Discussion points:
goalisa
for MaxCompute,kubectl
andelasticdl client
for Kubernetes.SELECT FROM
into a temp table