Closed michelole closed 2 years ago
Michel you are absolutely right. There are nevertheless slight differences in the code logic for retrieve_workspace for instance. When we run code from the operation folder, we assume the code will be run in a devops pipeline, which means we will use Service Principals to connect to the AML workspace. Whereas, when running the code from src, the scripts will either be run locally or on a experiment run.
There was a similar discussion here. The problem is not so straightforward to solve but we are eager to get your inputs :smiley: Do you have any suggestions ?
I agree as well. The main reason that's preventing us from putting everything together is that only the src/
folder is uploaded to the compute cluster for execution during training/batch inference, and hence we need some AML utilities there. Maybe we can look at simplifying src/utils.py
at maximum? And/or change the function names to make them more descriptive of the differences?
Since this also seems to be duplicated among templates (and potentially among their implementations), I'd favor moving this upstream.
Maybe we should then keep the discussion on the linked thread? Tentatively closing then.
If there's a decision not to upstream it, we could explore removing the duplicate at least in this repo by:
The private package option would be a good alternative to reduce code duplication. Nevertherless, there is work in progress for the next release of the aml sdk (here an example regarding pipelines: azureml-previewss) which will probably require some adjustments to the solution. Hence, I would suggest keeping it simple and accessible (vs a package) even though it means having some duplicated code.
Some of the
src/utils.py
code seems to be duplicated inside theoperations
folder:retrieve_workspace
get_dataset
This is a bit confusing when deploying the template.