Open samplise opened 4 years ago
We save the trained model to a table. There are several ways
tar
ed model directory as a table.tar
ed model directory to the HDFS as an external table.You can refer to https://github.com/sql-machine-learning/sqlflow/tree/develop/pkg/sqlfs for details. And this package is called at https://github.com/sql-machine-learning/sqlflow/blob/fd638707a3f6d9d3d3cfe7a9935a70a2c35a49bd/pkg/sql/model.go#L100
@Yancey1989 , I have some questions about the details of the command line echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train
.
According to my understanding, echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse
will generate a file like ir.proto_text
. But the definition of sqlflow_submitter.xgboost.train
is def train(datasource, select, model_params, train_params, feature_field_meta, label_field_meta, validation_select)
. How can this function take ir.proto_text
as the input argument?
Furthermore, can we make sqlflow_submitter.xgboost.train
take extra arguments there? For example, echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train --num_round 35 --max_depth 4
?
Hi @samplise
echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train --num_round 35 --max_depth 4?
this is a TODO feature, for the current implementation, we can use repl -e "SELECT * FROM ... TO TRAIN xgboost.gbtree"
to launch a XGBoost training job.
Hi @samplise
echo "SELECT ... TO TRAIN regressors:v0.2/MyDNNRegressor ..." | sqlflow -parse | python -m sqlflow_submitter.xgboost.train --num_round 35 --max_depth 4?
this is a TODO feature, for the current implementation, we can use
repl -e "SELECT * FROM ... TO TRAIN xgboost.gbtree"
to launch a XGBoost training job.
Where can I find more detailed usage on command repl
?
@tonyyang-svail I have some questions on xgboost model train and pred. In "sqlflow/python/sqlflow_submitter/xgboost/train.py", it saves the trained model locally via:
bst.save_model("my_model")
And in "sqlflow/python/sqlflow_submitter/xgboost/pred.py", it also loads the model locally by:bst.load_model("my_model")
Is this the only way to pass xgboost model? Does sqlflow support to write/read model to/from external storage (e.g., mysql)?