sql-machine-learning / models

Premade models for SQLFlow
Apache License 2.0
37 stars 17 forks source link

Unify model zoo implementation of SQLFlow and ElasticDL #22

Open typhoonzero opened 5 years ago

typhoonzero commented 5 years ago

Related design: https://github.com/sql-machine-learning/sqlflow/pull/949

We should implement the model zoo using the same scheme in SQLFlow and ElasticDL so that all models can be applied to SQLFlow and ElasticDL. Below are things to do:

  1. Unify the name to define loss/optimizer/metrics, e.g. SQLFlow use default_optimizer, ElasticDL use optimizer
  2. Whether to support custom train loop in ElasticDL, for SQLFlow, can support: https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/deep_embedding_cluster.py#L194
  3. Whether to support Keras functional API in SQLFlow, ElasticDL can support this kind of model now: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py
  4. SQLFlow use codegen for generate a dataset_fn function, how to use the dataset_fun defined in model zoo: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L117
  5. Unify PredictionOutputsProcessor: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L153, https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/dnnclassifier.py#L40
  6. support custom metrics in SQLFlow (eval_metrics_fn function in the model definition).
terrytangyuan commented 5 years ago
  1. Where is the naming convention for default_optimizer from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc.
  2. We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's sqlflow_train_loop() in the example you referenced being used?
  3. I'd say yes.
  4. The ElasticDL codegen currently generates the necessary dataset_fn for a given table in the database automatically already. If it's already defined in the model zoo, ElasticDL can override it by using the codegen-generated dataset_fn instead.
  5. How is prepare_prediction_column() being used? PredictionOutputsProcessor is a super set of prepare_prediction_column() so it should not be too hard to support SQLFlow's use case.
typhoonzero commented 5 years ago

@terrytangyuan

Where is the naming convention for default_optimizer from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc.

Either to change SQLFlow or ElasticDL to use the same name should be fine.

We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's sqlflow_train_loop() in the example you referenced being used?

sqlflow_train_loop() is called if this function exists in the model definition class when SQLFlow doing codegen.

How is prepare_prediction_column() being used? PredictionOutputsProcessor is a super set of prepare_prediction_column() so it should not be too hard to support SQLFlow's use case.

You're right.

typhoonzero commented 4 years ago

ps: also need to support eval_metrics_fn

typhoonzero commented 4 years ago

By changing the default function name to define optimizer and loss from default_optimizer and default_loss to optimizer and loss will cause an error when predicting:

Traceback (most recent call last):
  File "<stdin>", line 90, in <module>
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/sqlflow_submitter/tensorflow/predict.py", line 112, in pred
    classifier.predict_on_batch(one_batch[0])
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1024, in predict_on_batch
    x, extract_tensors_from_dataset=True)
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2385, in _standardize_user_data
    metrics=self._compile_metrics,
AttributeError: 'DNNClassifier' object has no attribute '_compile_metrics'

Change the name back and the test passes, It seems we can not use the name optimizer and loss in the Keras model definition.

To support both Keras functional API and subclass API, I propose to use the unified scheme in PR: https://github.com/sql-machine-learning/models/pull/25, the SQL statement SELECT * FROM iris.train TO TRAIN sqlflow_models.dnnclassifier ... uses the python module name dnnclassifier as the model's name, and will call sqlflow_models.dnnclassifier.get_model to create the model instance. In this case, we can satisfy:

  1. SQLFlow use Keras subclass models and functional API models
  2. ElasticDL models can be run directly in SQLFlow
  3. The model zoo implementation can keep the same user experience.
terrytangyuan commented 4 years ago

@typhoonzero Why is get_model() needed? In ElasticDL, we can support both subclass and functional Keras models without this additional method.

typhoonzero commented 4 years ago

Thanks for reminding, I'll check out how ElasticDL works today.

typhoonzero commented 4 years ago

Update: Will remove get_model(). I can expose the subclass name and function name can also work if we can check the name after TO TRAIN is a class or a function.

ps: we can get the python module object from the class by: https://stackoverflow.com/questions/7027848/getting-corresponding-module-from-function