solliancenet / MCW-MLOps

MIT License
2 stars 8 forks source link

Draft workshop reviews #1

Open DawnmarieDesJardins opened 5 years ago

DawnmarieDesJardins commented 5 years ago

Please leave your review comments for our authors here.

ezwiefel commented 5 years ago

Same suggestion about breaking up monolithic step-by-step file as found in the ADB/AML Workshop

ezwiefel commented 5 years ago

Consider using the RunDetails widget in the Deep Learning with Text notebook.

from azureml.widgets import RunDetails
RunDetails(run).show()
ezwiefel commented 5 years ago

Instead of registering the model using the Model.register step and then adding run_id as a tag, use run.register_model instead - that will automatically link the run_id to the model.

model_run = run.register_model(model_name=model_name, 
                               model_path="outputs/model/model.h5",
                               tags={"type": "classification"})

In AML version 1.0.45, however, there appears to be a bug preventing you from setting the model description during run.register_model, but I'd rather see description as a tag instead of run_id as a tag.

ezwiefel commented 5 years ago

For Exercise 3 - Task 4 - step 3 - Add an Azure Resource Manager service connection. You should note here that for many customers, they won't have access in their tenant to create a Service Principal... this has hurt me in several customer workshops.

You might even document the need for a service principal in the prereqs. (You can click on "Use the full version of the service connection dialog" to specify a pre-existing SP.)

ezwiefel commented 5 years ago

I'd like to see more detailed explanation of the build pipeline. What are the script being run? Why are the steps being taken? If I started my own project, how would I get started here?

As is, I don't feel like I've learned anything about MLOps Build pipelines by following these directions... I've just deployed someone else's YAML file.

shirolkar commented 5 years ago

Consider using the RunDetails widget in the Deep Learning with Text notebook.

from azureml.widgets import RunDetails
RunDetails(run).show()

The recommendation is accepted and completed.

shirolkar commented 5 years ago

Instead of registering the model using the Model.register step and then adding run_id as a tag, use run.register_model instead - that will automatically link the run_id to the model.

model_run = run.register_model(model_name=model_name, 
                               model_path="outputs/model/model.h5",
                               tags={"type": "classification"})

In AML version 1.0.45, however, there appears to be a bug preventing you from setting the model description during run.register_model, but I'd rather see description as a tag instead of run_id as a tag.

The recommendation is accepted and completed.

ezwiefel commented 5 years ago

For all but the first notebook being run, I'd avoid the Workspace.create step. Instead I'd use Workspace.from_config()

If the workspace doesn't exist outside the first notebook, you want an error flagging that!

ezwiefel commented 5 years ago

In deploy.py you are deleting an existing Webservice and then redeploying, but this is an anti-pattern in production, because it leaves your service unable to respond to requests and it changes the security keys.

Instead of deleting and recreating a Webservice, you should update the Webservice with a new model.

While I understand this isn't a production scenario, I worry that some attendees might use this repo as a base for deploying their own models.

shirolkar commented 5 years ago

I'd like to see more detailed explanation of the build pipeline. What are the script being run? Why are the steps being taken? If I started my own project, how would I get started here?

As is, I don't feel like I've learned anything about MLOps Build pipelines by following these directions... I've just deployed someone else's YAML file.

Added more explanation in Ex 4/Task 1/Step 4: "Review YAML file" to explain the 4 main steps in the build pipeline. Added more explanation in Ex 5/Task 7/Step 4: "Add Deploy & Test Webservice Task" to explain the main task in the release pipeline.

ezwiefel commented 5 years ago

Instead of registering the model using the Model.register step and then adding run_id as a tag, use run.register_model instead - that will automatically link the run_id to the model.

model_run = run.register_model(model_name=model_name, 
                               model_path="outputs/model/model.h5",
                               tags={"type": "classification"})

In AML version 1.0.45, however, there appears to be a bug preventing you from setting the model description during run.register_model, but I'd rather see description as a tag instead of run_id as a tag.

This should also be done in the mlops_quickstart repo in the Train.py step.

ezwiefel commented 5 years ago

In the quickstart repo, for the ModelDataCollector object, the first parameter should be the model name and not 'model_telemetry'. This will have implications with other dependent AML offerings, which will expect to see model name.

inputs_dc = ModelDataCollector(model_name, identifier="inputs")
prediction_dc = ModelDataCollector(model_name, identifier="predictions", feature_names=["prediction"])
shirolkar commented 5 years ago

In the quickstart repo, for the ModelDataCollector object, the first parameter should be the model name and not 'model_telemetry'. This will have implications with other dependent AML offerings, which will expect to see model name.

inputs_dc = ModelDataCollector(model_name, identifier="inputs")
prediction_dc = ModelDataCollector(model_name, identifier="predictions", feature_names=["prediction"])

Erik - Looks like you meant to provide this feedback for some other repo? You refer to the quickstart repo. This is for the MLOps MCW. Thanks

shirolkar commented 5 years ago

In deploy.py you are deleting an existing Webservice and then redeploying, but this is an anti-pattern in production, because it leaves your service unable to respond to requests and it changes the security keys.

Instead of deleting and recreating a Webservice, you should update the Webservice with a new model.

While I understand this isn't a production scenario, I worry that some attendees might use this repo as a base for deploying their own models.

We had attempted to update the AKS webservice instead of delete and redeploy. Unfortunately, updating an existing AKS webservice sometimes results in error like “Service deployment polling reached non-successful terminal state, current service state: Healthy”. Thus, to provide a consistent and predictable experience in training, we have opted to keep the current behavior. We will keep an eye on this issue to see if there is a solution that is stable and reliable for future updates.