Closed typhoonzero closed 4 years ago
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
model = tf.keras.Sequential([
feature_layer,
layers.Dense(128, activation='relu'),
layers.Dense(128, activation='relu'),
layers.Dense(1, activation='sigmoid')
])
eval_metrics_fn
? I think this is highly dependent on the type of model that's defined, e.g. regression vs. classification.dataset_fn
is automatically generated by ElasticDL codegen.What would be a good default for eval_metrics_fn? I think this is highly dependent on the type of model that's defined, e.g. regression vs. classification.
Well, I'll consider about this. It's true that the eval_metrics_fn
is needed for every model.
It's not needed if you are using SQLFLow directly since dataset_fn is automatically generated by ElasticDL codegen.
Is it possible to test a model using MaxCompute as the dataset, so that the dataset_fn
is automatically generated by ElasticDL, so that if we need to test a model if it's working, we don't need to write a dataset_fn
again.
Is it possible to test a model using MaxCompute as the dataset, so that the dataset_fn is automatically generated by ElasticDL, so that if we need to test a model if it's working, we don't need to write a dataset_fn again.
I think it should be auto generated by SQFLow instead since ElasticDL's model definition must contain information like feature column names and label column name, which should be written by the user as part of dataset_fn
. This kind of information can be easier to obtain through SQLFlow's extended query.
@terrytangyuan The problem is if we want to involve many model developers to contribute models, the dataset_fn
should not be a part of the model. It may be a function used to test the model is working, but not in the model's definition file.
I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.
The order of output
and labels
argument should be the same as Keras's loss functions.
@typhoonzero and I synced offline. To summarize, we will:
dataset_fn()
internally in ElasticDL so when ODPS data source is used this will be generated automatically without having to implement it in model definition file.--data_reader_params
or --envs
.tf.keras.layers.Input
which needs to know the input shape, which is not easy to support since dataset_fn
and model
is decoupled now. I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.
The order of
output
andlabels
argument should be the same as Keras's loss functions.
@typhoonzero This should be fixed by #1490. Please test to see if it works now.
@typhoonzero This can be closed now, right?
Background
Unify model zoo implementation of SQLFlow and ElasticDL: https://github.com/sql-machine-learning/models/issues/22 WIP PR: https://github.com/sql-machine-learning/models/pull/27
Custom Model Requirements for Unifying ModelZoo
feature_columns
argument when initializing a model.eval_metrics_fn
, so that this function is not "required" when writing a custom model definition.loss(output, labels)
function can not be reused inkerasmodel.compile
, should be compatible with keras loss functions, like:keras.losses.mean_squared_error(y_true, y_pred)
: https://keras.io/losses/dataset_fn
is still needed when reading data from MaxCompute: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/odps_iris_dnn_model/odps_iris_dnn_model.py