microsoft / MLOpsPython

MLOps using Azure ML Services and Azure DevOps
MIT License
1.19k stars 1.09k forks source link

Allow standalone run of train.py #73

Closed algattik closed 4 years ago

algattik commented 4 years ago

For developer productivity it's essential that the script containing the ML code (train.py) can be quickly run standalone ('inner-loop' development).

Traceback (most recent call last):
  File "code/training/train.py", line 61, in <module>
    exp = run.experiment
AttributeError: '_OfflineRun' object has no attribute 'experiment'
Fraser-Paine commented 4 years ago

Maybe not exactly what you're looking for, but I've found if a script works in the notebook linked below it will generally work in your MLOps pipeline assuming your environment is configured correctly. https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training/train-on-local/train-on-local.ipynb

algattik commented 4 years ago

This will add a lot of wallclock time overhead and is not really productive at this time for inner loop development. Ideally during prototyping, with a small data subset, the training script should not add more than a few seconds' overhead to the actual code.

Fraser-Paine commented 4 years ago

The overhead of the notebook is fairly minimal, the other overhead of the ml services workspace is necessary to make sure the script will work in the pipeline. I've used this method to develop a fairly complex train.py and there's at most a couple of seconds difference between running the train.py directly and running it through the notebook. This was worthwhile in my case as I was debugging interactions between the train.py and the workspace itself.

Alternatively if there are no interactions aside from logging, why can't you take your inner loop and just run it like any other python script with references to experiments and runs commented out? (If you want to get fancy and reuse the exact same train.py you could have interactions with the workspace check a 'running_local' flag so you can turn the behavior on and off with an argument that defaults to running normally in the pipeline. Though this will add some bloat to your code) When you're actually running it just uncomment/switch the flag to add workspace interactions back in and it's done.

Again, this wont work if you're using something like automl in your train.py script which also needs access to the workspace. Otherwise your inner loop logic shouldn't be dependent on the azureml workspace at all, e.g. the sklearn regression example only uses azureml for logging and storing experiment results.

If you're asking for the behavior I described with flags to be added to the API I don't see that being super likely in the short term. There are quite a few bigger issues with the API than the speed of inner loop testing which is more of a personal development pattern than something libraries should implement anyway.

algattik commented 4 years ago

Great points. We should integrate that into the template as it's not obvious otherwise :)