hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.6k stars 4.64k forks source link

azurerm_linux_function_app adding `/` character to front of docker image_name #20441

Open andre-qumulo opened 1 year ago

andre-qumulo commented 1 year ago

Is there an existing issue for this?

Community Note

Terraform Version

1.3.8

AzureRM Provider Version

3.43.0

Affected Resource(s)/Data Source(s)

azurerm_linux_function_app

Terraform Configuration Files

data "azurerm_service_plan" "test_service_plan" {
  name                = "linux-service-plan-${var.deployment_environment}-1"
  resource_group_name = "test"
}

/*
 _____                 _   _
|  ___|   _ _ __   ___| |_(_) ___  _ __
| |_ | | | | '_ \ / __| __| |/ _ \| '_ \
|  _|| |_| | | | | (__| |_| | (_) | | | |
|_|   \__,_|_| |_|\___|\__|_|\___/|_| |_|
 FIGLET: Function
*/

resource "azurerm_storage_account" "test_app_sa" {
  name                     = "responderservicesa${local.rg_suffix}"
  resource_group_name = "test"
  location            = "westus2"
  account_tier             = "Standard"
  account_replication_type = "LRS"
}

resource "azurerm_linux_function_app" "test_app_function" {
  name                = "test-app-${local.rg_suffix}"
  resource_group_name = "test"
  location            = "westus2"

  storage_account_name       = azurerm_storage_account.test_app_sa.name
  storage_account_access_key = azurerm_storage_account.test_app_sa.primary_access_key
  service_plan_id            = data.azurerm_service_plan.test_service_plan.id
  builtin_logging_enabled    = false

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.test_app.id]
  }
  app_settings = {
    WEBSITES_ENABLE_APP_SERVICE_STORAGE = false
    FUNCTION_APP_EDIT_MODE              = "readOnly"
    RUST_BACKTRACE                      = 1
  }

  site_config {
    always_on                                     = true
    vnet_route_all_enabled                        = true
    container_registry_use_managed_identity       = true
    container_registry_managed_identity_client_id = azurerm_user_assigned_identity.test_app.client_id
    application_stack {
      docker {
        registry_url = data.azurerm_container_registry.services_acr.login_server
        image_name   = "test-repo"
        image_tag    = "latest"
      }
    }
    app_service_logs {
      disk_quota_mb         = 35
      retention_period_days = 90
    }

  }
  depends_on = [
    azurerm_role_assignment.test_app_docker_pull
  ]
}

#  ___    _            _   _ _
# |_ _|__| | ___ _ __ | |_(_) |_ _   _
#  | |/ _` |/ _ \ '_ \| __| | __| | | |
#  | | (_| |  __/ | | | |_| | |_| |_| |
# |___\__,_|\___|_| |_|\__|_|\__|\__, |
#                                |___/
#  FIGLET: Identity

resource "azurerm_user_assigned_identity" "test_app" {
  resource_group_name = "test"
  location            = "westus2"

  name = "test_app-identity-${local.rg_suffix}"
}

resource "azurerm_role_assignment" "test_app_docker_pull" {
  scope                = data.azurerm_container_registry.services_acr.id
  role_definition_name = "AcrPull"
  principal_id         = azurerm_user_assigned_identity.test_app.principal_id
}

Debug Output/Panic Output

The kudu output is:

2023-02-10T23:57:24.371Z ERROR - DockerApiException: Docker API responded with status code=BadRequest, response={"message":"invalid reference format"}

Expected Behaviour

The deployment configuration in azure's image field shouldn't have a "/" in the front

Actual Behaviour

The deployment configuration in azure's image field has a "/" in the front

Steps to Reproduce

You can remove the leading "/", restart the app, and it will load. If you re-run "apply" it will add the leading "/" to it and fail to pull again.

Important Factoids

No response

References

No response

xiaxyi commented 1 year ago

Thanks @andre-qumulo for raising this issue. May I confirm:

  1. whether the app is created or not?
  2. If it's created but cannot be started, can you go to the state file and share the value of linuxFxVersion and the value of docker part with me?
andre-qumulo commented 1 year ago
  1. The app is created
  2. the linuxFxVersion is `"DOCKER|/test-repo:latest"
andre-qumulo commented 1 year ago

Part of the challenge is that it's inconsistent. Sometimes the linuxFxVersion looks like "DOCKER|login-server-url/test-repo:latest" and sometimes it's "DOCKER|/test-repo:latest"

I ran with TF_LOG=TRACE until I could reproduce it: I can see the PUT to PUT /subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Web/sites/test-app-agagne?api-version=2021-03-01 HTTP/1.1 has "linuxFxVersion":"DOCKER|/test-repo:latest"

mseiler90 commented 1 year ago

I see this occurring as well. Both the image_name randomly having a / and the issue of linuxFxVersion randomly not having the registry URL

andre-qumulo commented 1 year ago

I am wondering if this is because we're getting the login server url as an ACR data instead of resource so the url isn't known?

I also wonder if part of the problem is that the function is trying to redeploy every time as a result of azure adding an "https://" to the registry name.

I recently changed my terraform to have:

  application_stack {
      docker {
        registry_url = "https://${data.azurerm_container_registry.services_acr.login_server}"
        image_name   = "test-repo"
        image_tag    = "latest"
      }
    }

And haven't seen terraform try to change that section since. I don't think this is a guaranteed workaround but might reduce the frequency of failures.

mseiler90 commented 1 year ago

Looks like someone else came to this same conclusion in this issue https://github.com/hashicorp/terraform-provider-azurerm/issues/20393

andre-qumulo commented 1 year ago

This definitely feels like a bug with the azurerm provider. I think if we don't provide the "https://" then azure will add it, the provider sees that and thinks its a difference. I suspect there's a race condition with using a data "azurerm_container_registry" where terraform doesn't get the login server and thus doesn't update it correctly.

I am seeing in the trace that the GET request for the linux function app also has a linuxFxVersion that's missing the login server url

xiaxyi commented 1 year ago

@andre-qumulo can you use login_server instead of id?

 docker {
        registry_url = data.azurerm_container_registry.test.login_server 
        image_name   = "dotnetsamples"
        image_tag    = "aspnetapp"
      }
andre-qumulo commented 1 year ago

@andre-qumulo can you use login_server instead of id?

 docker {
        registry_url = data.azurerm_container_registry.test.login_server 
        image_name   = "dotnetsamples"
        image_tag    = "aspnetapp"
      }

@xiaxyi This is a problem with the example I gave you but not the problem I'm calling out. I already did the analysis (see the conversation above your reply). Can you point me to the code where this is set and maybe I'll take care of it myself?

xiaxyi commented 1 year ago

@andre-qumulo we are currently considering change the way of how to compose linuxFxVersion, let me also add your issue into the issue list, will update once there is a progress.

andre-qumulo commented 1 year ago

Understood,

Thank you! If you do point me at the code I can make a small pr to fix this issue. Might be quick.

Andre

On Wed, Feb 22, 2023, 5:54 PM Xiaxin @.***> wrote:

@andre-qumulo https://github.com/andre-qumulo we are currently considering change the way of how to compose linuxFxVersion, let me also add your issue into the issue list, will update once there is a progress.

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/20441#issuecomment-1441123588, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4OQ326SCYVWMDYVUMLH6EDWY27M7ANCNFSM6AAAAAAU2TFTIU . You are receiving this because you were mentioned.Message ID: @.***>

xiaxyi commented 1 year ago

@andre-qumulo Thanks a lot for offering the help! There are multiple places relates to the docker settings, not only of the way to compose linuxFxVersion but we need to update the app_setting block as well. you may check this function

Besides, the change to docker stack will also include linux web app.

mickare commented 1 year ago

Same issue here with hashicorp/azurerm == 3.45.0

Configuring the Docker as suggested by @andre-qumulo , but it always results in:

"linuxFxVersion": "DOCKER|/image/name:mytag"

With:

docker {
    registry_url = "myrepo.azurecr.io"
    image_name   = "image/name"
    image_tag    = "mytag"
}
mickare commented 1 year ago

Looks like it's fixed with 3.47.0.

spr0ut commented 1 year ago

@mickare Still present with 3.61.0.

infosabs commented 1 year ago

Is there any update on this bug or an ETA when it will be fixed/released?

Octarines commented 1 year ago

We are facing this issue with version 3.57.0. I wanted to add that the workaround suggested by @andre-qumulo has worked in fixing this for us: image It appears that it is a bug in whatever code has been added in azurerm_linux_function_app to auto populate the https:// prefix in cases where users have omitted it. The fact that appears to happens for us exclusively on subsequent releases with new versions to existing resources (as well as having a poke around inside the state files) leads me to believe its an issue with some erroneous editing of the state file post deployment.

mrickettsk commented 1 year ago

As far as I can tell, I believe the issue lies here in the provider code:

if linuxAppStack.Docker != nil && len(linuxAppStack.Docker) == 1 {
            dockerConfig := linuxAppStack.Docker[0]
            appSettings = updateOrAppendAppSettings(appSettings, "DOCKER_REGISTRY_SERVER_URL", dockerConfig.RegistryURL, false)
            appSettings = updateOrAppendAppSettings(appSettings, "DOCKER_REGISTRY_SERVER_USERNAME", dockerConfig.RegistryUsername, false)
            appSettings = updateOrAppendAppSettings(appSettings, "DOCKER_REGISTRY_SERVER_PASSWORD", dockerConfig.RegistryPassword, false)
            var dockerUrl string
            for _, prefix := range urlSchemes {
                if strings.HasPrefix(dockerConfig.RegistryURL, prefix) {
                    dockerUrl = strings.TrimPrefix(dockerConfig.RegistryURL, prefix)
                    continue
                }
            }
            expanded.LinuxFxVersion = utils.String(fmt.Sprintf("DOCKER|%s/%s:%s", dockerUrl, dockerConfig.ImageName, dockerConfig.ImageTag))
        }

If the dockerConfig.RegistryURL does not have a prefix in the urlSchemes range, the dockerUrl variable will remain an empty string and the expanded.LinuxFxVersion will be set to "DOCKER|/:image-name:image-tag".

This is the observed behaviour - unless folks provide a https:// then the dockerUrl is blank.

Assuming this is the issue, a fix could be to set a default before the evaluation takes place so that if no evaluation is required, the string is set correctly:

dockerUrl := dockerConfig.RegistryURL // i.e. Set a default value
for _, prefix := range urlSchemes {
    if strings.HasPrefix(dockerConfig.RegistryURL, prefix) {
        dockerUrl = strings.TrimPrefix(dockerConfig.RegistryURL, prefix)
        break
    }
}
expanded.LinuxFxVersion = utils.String(fmt.Sprintf("DOCKER|%s/%s:%s", dockerUrl, dockerConfig.ImageName, dockerConfig.ImageTag))
heras1991 commented 11 months ago

I have resolved adding the docker_registry_url = "https://index.docker.io" to the application_stack block. Please, replace the docker_image_name with your docker image

  site_config {
    always_on = true
    ftps_state  = "FtpsOnly"
    application_stack {
      docker_image_name = "sonarqube:lts-community"
      docker_registry_url = "https://index.docker.io"
    }
  }
andre-qumulo commented 11 months ago

I have resolved adding the docker_registry_url = "https://index.docker.io" to the application_stack block. Please, replace the docker_image_name with your docker image

  site_config {
    always_on = true
    ftps_state  = "FtpsOnly"
    application_stack {
      docker_image_name = "sonarqube:lts-community"
      docker_registry_url = "https://index.docker.io"
    }
  }

Where is the commit for this fix?

marcindulak commented 8 months ago

I'm seeing this in a non-reproducible way for azurerm_linux_web_app. Sometimes the terraform state contains BAD "linux_fx_version": "DOCKER|/myregistry.azurecr.io/myrepo:latest", and sometimes GOOD "linux_fx_version": "DOCKER|myregistry.azurecr.io/myrepo:latest". In both cases the azurerm_linux_web_app has a dependency on azurerm_container_registry.this in the state.

It happened a few times over the past couple of weeks, then it did not happen again.

Using terraform 1.7.4 and azurerm 3.94.0

locals {
  ...
  primary_docker_name = "${azurerm_container_registry.this.login_server}/myrepo"
  primary_docker_tag  = "latest"
}

...
resource "azurerm_linux_web_app" "this" {

  site_config {

    application_stack {
      ...
      docker_image_name = "${local.primary_docker_name}:${local.primary_docker_tag}"
    }
}

Not using docker_registry_url at all.

roviracarlos commented 8 months ago

Same Issue here yesterday and today multiple App Services goes down because cannot get the image. Other App Services with the same config don't have the problem in the same deployment and nothing is showed on the plan or deployment, only you can see in the activity logs that Terraform change the image Url adding the / in the beggining and delete the Docker variables.

Using Terraform 1.7.4-1 and azurerm 3.95.0

crispinboylan-int commented 7 months ago

This has happened several times to me with app service.

I noticed whenever this has happened i've seen empty values in configuration settings for DOCKER_REGISTRY_SERVER_PASSWORD, DOCKER_REGISTRY_SERVER_USERNAME, DOCKER_REGISTRY_SERVER_URL (possibly related to https://github.com/hashicorp/terraform-provider-azurerm/pull/24424/ )

we don't use this and we dont specify the image in the terraform, we leave that for deploys via ADO. I'm not sure if these are occasionally being added empty by terraform when the error occurs or if when we created the resources initially we used a buggy provider version (its notable that only resources created since around november and before feb had these values empty).

I've removed them all from affected, and will report back if they appear again.

roviracarlos commented 7 months ago

In my case this appears only one time when enable that site config with the lates release of the azurerm 3.95.0 on different subscriptions, regions, days and cloud infra.

2024-03-13T13:29:03.7933364Z ~ site_config { 2024-03-13T13:29:03.7933537Z + ip_restriction_default_action = "Allow" 2024-03-13T13:29:03.7933748Z + scm_ip_restriction_default_action = "Allow"

2024-03-11T09:52:21.7790692Z ~ site_config { 2024-03-11T09:52:21.7790978Z + ip_restriction_default_action = "Allow" 2024-03-11T09:52:21.7791446Z + scm_ip_restriction_default_action = "Allow"

This is the only in the plan, but in background do more than this over some of the resources.

roviracarlos commented 7 months ago

We continue with the investigation and we found that the problem is the site_config[0].linuxFxVersion that is defined by the provider and incorrectly saved to the state file. I review the history of the state files from last year and multiple web apps have the wrong setting, so now that edit some values there, because the rest is managed by ADO with CI/CD Terraform set what they have in the state file without any notice.

crispinboylan-int commented 3 months ago

I found the issue. In our setup in the terraform we dont mention any images or docker parameters.

We then deploy via Azure Devops AzureRmWebAppDeployment task which sets up the docker parameters and everything works.

That is until we make a change to the app service resource (for example app settings, but anything will do it). This causes docker_image_name to get set in the state to e.g. ourregistry.azurecr.io/image:1.2 docker_custom_image_name gets updated in app settings to the same value and linux_fx_version gets set to /registry.azurecr.io/image:1.2

i found that setting DOCKER_REGISTRY_SERVER_URL to https://ourregistry.azurecr.io during the AzureRmWebAppDeployment task works around that, and when the next terraform infra job is run the state gets updated differently

docker_image_name becomes image:1.2 docker_registry_url gets set to the https://ourregistry.azurecr.io and linux_fx_version gets set correctly without the leading /

so it seems its definitely a bug in the provider.