sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.08k stars 697 forks source link

Katib on SQLFlow #1326

Open samplise opened 4 years ago

samplise commented 4 years ago

@tonyyang-svail I have some questions on xgboost model train and pred. In "sqlflow/python/sqlflow_submitter/xgboost/train.py", it saves the trained model locally via: bst.save_model("my_model") And in "sqlflow/python/sqlflow_submitter/xgboost/pred.py", it also loads the model locally by: bst.load_model("my_model")

Is this the only way to pass xgboost model? Does sqlflow support to write/read model to/from external storage (e.g., mysql)?

tonyyang-svail commented 4 years ago

We save the trained model to a table. There are several ways

  1. For MySQL and MaxCompute, we upload the tared model directory as a table.
  2. For Hive, we upload the tared model directory to the HDFS as an external table.

You can refer to https://github.com/sql-machine-learning/sqlflow/tree/develop/pkg/sqlfs for details. And this package is called at https://github.com/sql-machine-learning/sqlflow/blob/fd638707a3f6d9d3d3cfe7a9935a70a2c35a49bd/pkg/sql/model.go#L100

samplise commented 4 years ago

@Yancey1989 , I have some questions about the details of the command line echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train.

According to my understanding, echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse will generate a file like ir.proto_text. But the definition of sqlflow_submitter.xgboost.train is def train(datasource, select, model_params, train_params, feature_field_meta, label_field_meta, validation_select). How can this function take ir.proto_text as the input argument?

Furthermore, can we make sqlflow_submitter.xgboost.train take extra arguments there? For example, echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train --num_round 35 --max_depth 4?

Yancey1989 commented 4 years ago

Hi @samplise

echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train --num_round 35 --max_depth 4?

this is a TODO feature, for the current implementation, we can use repl -e "SELECT * FROM ... TO TRAIN xgboost.gbtree" to launch a XGBoost training job.

samplise commented 4 years ago

Hi @samplise

echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train --num_round 35 --max_depth 4?

this is a TODO feature, for the current implementation, we can use repl -e "SELECT * FROM ... TO TRAIN xgboost.gbtree" to launch a XGBoost training job.

Where can I find more detailed usage on command repl?