Open RoozbehBandpey opened 2 years ago
@RoozbehBandpey , Hi, thank you for taking the time to report this issue.
I checked the features requested, sadly there're some issues supporting them:
setupScripts
: there'll be some changes introduced in v2 API and not decided yet, so the service team wants us to hold on supporting this feature.schedules
: I didn't find the definition of this in https://github.com/Azure/azure-rest-api-specs/blob/main/specification/machinelearningservices/resource-manager/Microsoft.MachineLearningServices/stable/2021-07-01/machineLearningServices.json, I'll contact service team whether there're something missing.@ms-henglu Thanks! Did you get any response from the team? This would become an important feature since most teams would want to manage their compute costs in ML.
IIUC the API to wrap the schedules
section should be https://docs.microsoft.com/en-us/rest/api/azureml/compute/update-schedules.
Hi @chamilad ,
Sorry for late reply. The schedules
only exists in api-version 2021-03-01-preview(currently azurerm uses 2021-07-01), and not added to stable api-version yet, so we can't support this feature.
Thanks @ms-henglu ! I'll keep a look out for updates.
Is there work in progress on this issue? If I understand machine_learning_compute_instance_resource.go correct then the api-version is now 2022-05-01. So this should now be possible. This issue is a showstopper for us. We need to recreate the compute instances frequently in order to update them. So all manual changes are frequently lost.
@ms-henglu any update on this?
This is still an issue, can we please get an update on this?
Can the upstream/microsoft tag be removed from this issue? These properties have been supported by the REST API for several months now.
Can someone give us an update on this feature?
I think you don't realise how much these feature can help. Saving costs, electricity cost for Microsoft. Custom configuration for Compute instances (env variable for people having to deal with https_proxy and no_proxy).
In my CICD, I can create Daily 10-20 computes instances. Yes I can use the UI for the schedules/IDLE (it's a waste of time). But I have to create a documentation for all users that need to edit /etc/environment
to add https_proxy
and no_proxy
env variable. Because we need them to make a pip install of any packages or any debian packages. Even to download the Vscode Server when we want to connect in remote to the compute instance. Data Scientists are not devops, and most of them are not used to edit a protected file and could delete important configurations or files with sudo
.
I think I found a workaround in Terraform. We could use azapi_resource
in terraform to managed the compute instance ressource.
https://learn.microsoft.com/en-us/azure/templates/microsoft.machinelearningservices/workspaces/computes?pivots=deployment-language-terraform https://registry.terraform.io/providers/Azure/azapi/latest/docs/resources/azapi_resource
It seems over complicated to do it like this. But feasable. Not sure if we need to redefine all properies
or if we can only define and setupScripts
.
Also this does not cover Idle shutdown. Here is a response from Microsoft Support Team:
for query on setup script in terraform i do see we have the startup and setupscript in teraform now:
https://learn.microsoft.com/en-us/azure/templates/microsoft.machinelearningservices/workspaces/computes?pivots=deployment-language-terraform
here is the sample script to setup proxy azureml-examples/setup/setup-ci/jupyter-proxy.sh at main · Azure/azureml-examples (github.com)
https://github.com/Azure/azureml-examples/blob/main/setup/setup-ci/jupyter-proxy.sh
Regarding idleshutdown our team confirmed that they have a work-item for it but its not going to be in this release cycle.
Community Note
Description
Azure ML offers scheduling and setup scripts for compute instance creation. Our current workaround is to apply these changes with post-provisioning scripts. ARM templates can be found here: https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.machinelearningservices/machine-learning-compute-create-computeinstance Would be great to have the possibility of doing so with Terraform.
New or Affected Resource(s)
azurerm_machine_learning_compute_instance
Potential Terraform Configuration