hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.51k stars 4.6k forks source link

Support for Azure Machine Learning Inference cluster (requires new resource) #11252

Closed gro1m closed 3 years ago

gro1m commented 3 years ago

Community Note

Description

Azure offers Inference clusters on Azure Kubernetes Service to use ML models in a productive service.

A general description of the machine learning workflow by Azure can be found here: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-studio. Would be cool if of all of these resources were to provided via Terraform (but probably also a huge effort)

New or Affected Resource(s)

Potential Terraform Configuration

resource "azurerm_machine_learning_aks_inference_cluster" "ml_aks_inference" {
workspace_name = "aml-workspace"
web_service_name = "my-aml-webservice"
location = "westeurope"
environment_name = "AzureML-Scikit-learn-0.20.3" #https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments, https://docs.microsoft.com/en-us/azure/machine-learning/resource-curated-environments
environment_version = "3"
driver_program = "predict.py"
model_configuration = {name="my_model.pkl", path = "modles/my_model.pkl", framework = "ScikitLearn", framework_version = "0.20.3"}
scoring_timeout_ms = 1
app_insights_enabled = false
auth_enabled = false
aad_auth_enabled = false
compute_name = "ml_aks_inference_cluster"
kubernetes_service = <reference to aks cluster resource or data source>
}

Question: How much of Azure Kubernetes cluster configuration could be reused here, i.e. from azurerm_kubernetes_cluster and azurerm_kubernetes_cluster_node_pool resources and data sources? It does not seem sensible to me to redo another AKS redefinition in this resource anymore, but unfortunately I am not sure on to how much would be needed here apart from specific naming and model-specific configurations.

References

gro1m commented 3 years ago

@ArcturusZhang I started implementing the above resource and that you are heavily involved in bringing the Azure-SDK-for-go forward, but did not find you here. I would just like to make sure that I am focusing on the right code, as the terminology can be sometimes a bit confusing regarding ML resources (e.g. "Compute Cluster" <-> "AmlCompute", et cetera). I plan to use https://github.com/Azure/azure-sdk-for-go/blob/e19a30aca35ffb0f3baca123eb374633b1462e9a/services/machinelearningservices/mgmt/2020-04-01/machinelearningservices/models.go#L282 to implement the inference cluster. And if Kubernetes cluster already exists I plan to use: https://github.com/Azure-Samples/azure-sdk-for-go-samples/blob/master/compute/container_cluster.go#L90. Am I on the right track or do I have to adjust something? - thank you!

EDIT: The methods above are not correct (I understood there are just for checking by inspecting the return values). I will work my way back from the CreateOrUpdate API for the compute cluster. Is it correct to first setup compute cluster and then connect AKS service or does one first create AKS service and then attach a compute cluster to end up with the inference cluster?

ArcturusZhang commented 3 years ago

Hi @gro1m thanks for your contribution!

To be honest I am not fully understand the terminology of machine learning resources. As far as I know, AmlCompute refers to Azure Machine Learning Compute resource created by this API. As for the AKS type of compute instance here, based on its model definition here, it has a property of ResourceID, which might accept a resource ID of existing AKS cluster. This might need to find out in its own doc, sorry I do not have much knowledge about this.

github-actions[bot] commented 3 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.