hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.51k stars 4.6k forks source link

Support for idle shutdown option in `azurerm_machine_learning_compute_instance` #20973

Open mclacore opened 1 year ago

mclacore commented 1 year ago

Is there an existing issue for this?

Community Note

Description

Azure Machine Learning compute instance allows setting an idle shutdown to save on costs: image

Adding this to the terraform resource would be greatly appreciated.

I realize this is a preview feature for Machine Learning Workspace, but it seems to ship enabled by default.

New or Affected Resource(s)/Data Source(s)

azurerm_machine_learning_compute_instance

Potential Terraform Configuration

resource "azurerm_machine_learning_compute_instance" "example" {
  name                          = "example"
  location                      = azurerm_resource_group.example.location
  machine_learning_workspace_id = azurerm_machine_learning_workspace.example.id
  virtual_machine_size          = "STANDARD_DS2_V2"
  authorization_type            = "personal"
  idle_shutdown_enable          = true
  idle_shutdown_duration        = "PT1H"
  ssh {
    public_key = var.ssh_key
  }
  subnet_resource_id = azurerm_subnet.example.id
  description        = "foo"
  tags = {
    foo = "bar"
  }
}

References

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-manage-compute-instance?tabs=python#enable-idle-shutdown-preview

gesnaud commented 9 months ago

Hi community!

I wonder to have this feature a day!

pavanmuni321 commented 9 months ago

Still waiting for this property to be added!! Meanwhile I did a workaround using terraform to call ARM template which has this attribute.

Uranium2 commented 5 months ago

Any updates on this useful feature?

@pavanmuni321 could you share how you did to get the ARM template of an existing Compute Instance? I do not want to use the ARM template of all AML ressource

ltutar commented 5 months ago

I also would like to have this option.

pavanmuni321 commented 4 months ago

Any updates on this useful feature?

@pavanmuni321 could you share how you did to get the ARM template of an existing Compute Instance? I do not want to use the ARM template of all AML ressource

Here is the arm template I have used. Please note I have tested other api versions and only "2023-10-01" seems to be working for me!

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "parameterBlock": {
            "type": "Object"
        }
    },
    "variables": {},
    "resources": [
        {
            "type": "Microsoft.MachineLearningServices/workspaces/computes",
            "apiVersion": "2023-10-01",
            "name": "[parameters('parameterBlock').lcomputeName]",
            "location": "[parameters('parameterBlock').location]",
            "identity": {
                "type": "[parameters('parameterBlock').managedIdentityType]",
                "userAssignedIdentities": {
                    "[parameters('parameterBlock').userAssignedIdentity_id]": {}
                }
            },
            "properties": {
                "computeLocation": "[parameters('parameterBlock').location]",
                "description": "[parameters('parameterBlock').description]",
                "disableLocalAuth": "true",
                "computeType": "ComputeInstance",
                "properties": {
                    "computeInstanceAuthorizationType": "personal",
                    "enableNodePublicIp": false,
                    "idleTimeBeforeShutdown": "[parameters('parameterBlock').idleTimeBeforeShutdown]",
                    "personalComputeInstanceSettings": {
                        "assignedUser": {
                            "objectId": "[parameters('parameterBlock').objectId]",
                            "tenantId": "[parameters('parameterBlock').tenantId]"
                        }
                    },
                    "subnet": {
                        "id": "[parameters('parameterBlock').subnet_id]"
                    },
                    "vmSize": "[parameters('parameterBlock').vmSize]"
                }
            }
        }
    ]
}

which is deployed using terraform below

# Create Compute Instances in Azure Machine Learning Workspace (using ARM template)
resource "azurerm_resource_group_template_deployment" "ml_compute_instance" {
  name                = "rgt-compute-instance"
  resource_group_name = var.resourceGroupName

  parameters_content = jsonencode({
    parameterBlock = {
      value = {
        name                   = var.computeName
        location               = var.location
        tenantId               = data.azurerm_client_config.current.tenant_id
        objectId               = var.userObjectId
        workspaceName          = var.WorkspaceName
        description            = "Compute Instance Assigned to Single-User"
        vmSize                 = var.cpu_compute_size
        idleTimeBeforeShutdown = var.instance_shutdown_time
        subnet_id              = var.subnetId
        managedIdentityType    = var.managed_identity_type
        userAssignedIdentityId = azurerm_user_assigned_identity.ml_managed_identity.id
      }
    }
  })

  template_content = file("${path.module}/aml-compute.json")
  deployment_mode  = "Incremental"
}
ltutar commented 4 months ago

@Uranium2 If it is for any help, I am using the following to create the compute instance with idle time before shutdown.

resource "azapi_resource" "ci003" {
  name      = "adm-ltutar"
  parent_id = azurerm_machine_learning_workspace.ml_workspace.id
  type      = "Microsoft.MachineLearningServices/workspaces/computes@2023-08-01-preview"
  location  = var.global_settings.location
  identity {
    type = "SystemAssigned"
  }
  body = jsonencode({
    properties = {
      computeType      = "ComputeInstance"
      description      = "This compute instance is created for Levent Tutar"
      disableLocalAuth = true
      properties = {
        vmSize                 = "STANDARD_DS3_v2"
        idleTimeBeforeShutdown = "PT15M"
        enableNodePublicIp     = false
        personalComputeInstanceSettings = {
          assignedUser = {
            objectId = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxx"
            tenantId = var.global_settings.tenant_id
          }
        }
      }
    }
  })
}

with the following global settings:

global_settings = {
  tenant_id       = "xxx-xxx-xxx-xxx-xxxx"
  subscription_id = "xxx-xxx-xxx-xxx-xxx"

  # Deployment variables
  location_short = "weu"
  location       = "westeurope"
  serial_number  = "001"
  app_short      = "ml"
  environment    = "dev" #short like dev,acc,prd
}
deepaknani007 commented 3 months ago

Any updates on adding this feature?

MattChapplePA commented 1 month ago

+1 on this, would be great to deploy terraform compute instances per workspace but dial them down rather than leave them running and have to manually go in to set the idle time.