microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
365 stars 29 forks source link

[API] Impossible to verify custom domain during Container App or Workspace creation due to circular dependency #525

Open bbi-willemtoorenburgh opened 1 year ago

bbi-willemtoorenburgh commented 1 year ago

This issue is a:

Issue description

Initially reported in the Pulumi Azure Native provider, which contains a lot of context: https://github.com/pulumi/pulumi-azure-native/issues/2117

While trying to create a Container App with a custom DNS address and SSL certificate, the creation request is rejected due to a validation issue:

azure-native:app:ContainerApp (container-app):
    error: Code="InvalidCustomHostNameValidation" Message="A TXT record pointing from asuid.<dns> to <custom domain verification ID> was not found."

This usage pattern worked as of 26 days ago, which is why this issue is marked as a regression.

When creating a Container App or Workspace with a custom domain, the API is expecting the existence of the verification TXT record. However, it's impossible to get the verification ID for the record without first creating the Container App or Workspace. This creates a circular dependency that is impossible to resolve.

I suspect that when the Container Apps team rolled out the new Workspace-level custom DNS feature, a validation step for the TXT DNS record was added or moved to the wrong spot in the order of operations. As it stands, it seems impossible to create Container Apps or Workspaces with custom DNS addresses via the API.

Steps to reproduce

Please see linked issue for reproduction steps.

Expected behavior A Container App or Workspace is created with a custom DNS configuration, and the user can then retrieve the verification ID via subsequent API calls to create the validation TXT record.

Actual behavior API throws a validation error citing the absence of the validation TXT record, which at this step is impossible to create.

SophCarp commented 1 year ago

Thank you for opening this issue. We'll report back the status once we've validated it on our end.

ahmelsayed commented 1 year ago

@bbi-willemtoorenburgh Once you create the environment, you can get the verificationId through properties.customDomainConfiguration.customDomainVerificationId on the environment response once it's created.

It should be the same as the ones you get on the app response, and then you will be able to just set on your DNS before creating the app.

praneetloke commented 1 year ago

I believe the problem is that Container Apps service is looking for the DNS verification records for the very Environment resource that is being created. It happens when you set the DNS suffix at the time of creation, which is allowed by ARM APIs (and therefore likely a problem with creation even via Bicep and ARM Templates.) This doesn't seem to be an issue with creating an Environment through the portal, though, since it doesn't show the DNS suffix setting at the time of creation; only after the environment has been created which allows one to create the required verification records before adding the DNS suffix.

Also this was not the behavior before the introduction of DNS suffix at the Environment-level, by the way. This error is also triggered when specifying the custom domain for a container app at the time of creation using its ingress property.

(@bbi-willemtoorenburgh and I were chatting about this in the Pulumi Community Slack and the linked issue.)

bbi-willemtoorenburgh commented 1 year ago

Praneet has the right of it! @ahmelsayed if that is now the only option, that's a pretty massive un-versioned breaking change in the API: it means customers can no longer include custom domain configurations in creation requests for Environment resources or individual Container App resources, and a separate API request is now necessary to do such an operation. This likely breaks pretty much all infrastructure-as-code workflows.

howang-ms commented 1 year ago

@bbi-willemtoorenburgh , @praneetloke , the custom domain verification id is the same in the subscription scope, which means it is same for all the container apps and environments under the same subscription. I don't fully understand the breaking change you mentioned, as it shouldn't be the case. The customer domain binding could be done during the environment or container app creation if you configured the TXT record correctly with the verification id. Do you a issue while the TXT record is configured correctly?

praneetloke commented 1 year ago

Do you a issue while the TXT record is configured correctly?

I don't have an issue configuring the DNS suffix after the TXT record is configured correctly. The issue is that one cannot create a fresh container app environment and specify the DNS suffix without having added the TXT verification record since the verification ID isn't available until after the environment has been created.

the custom domain verification id is the same in the subscription scope, which means it is same for all the container apps and environments under the same subscription.

In that case, I suppose one could create the TXT record ahead of creating the container app environment using the verification ID for that subscription but how does one get this verification ID in an automated way for a fresh subscription where no container app environment has been created yet?

bbi-willemtoorenburgh commented 1 year ago

@howang-ms I did my best to scrutinize container apps' API spec over in https://github.com/Azure/azure-rest-api-specs/tree/main/specification/app/resource-manager and I couldn't find any defined API calls which would allow a user or implementer to retrieve a subscription's container apps verification ID without having first created an environment.

This change has created a hidden sharp edge where users or implementers will run into this issue (especially in an infrastructure-as-code scenario) and spend much time debugging it before learning that they must somehow get the verification ID before creating their app or environment. They must then retain that information in perpetuity in order to not run into the issue again. I cannot stress enough how user-unfriendly this change is, if it was indeed an intentional one.

jchannon commented 1 year ago

@praneetloke I've just hit this issue while deploying a container app to our prod container app environment. As far as I can see we have the custom domain settings configured in the container environment but when I try to create an instance of a container app via pulumi using the settings in yaml, we get the TXT error. Is there a way past this?

  app:ingress-custom-domain-host-name: "foo"
  app:ingress-custom-domain-certificate-name: "bar"
corwestermaniddink commented 1 year ago

When creating a Container App or Workspace with a custom domain, the API is expecting the existence of the verification TXT record. However, it's impossible to get the verification ID for the record without first creating the Container App or Workspace. This creates a circular dependency that is impossible to resolve.

Is that TXT record correctly set at ALL nameservers of your hostname provider?

bbi-willemtoorenburgh commented 1 year ago

When creating a Container App or Workspace with a custom domain, the API is expecting the existence of the verification TXT record. However, it's impossible to get the verification ID for the record without first creating the Container App or Workspace. This creates a circular dependency that is impossible to resolve.

Is that TXT record correctly set at ALL nameservers of your hostname provider?

@corwestermaniddink the problem is not that the DNS record hadn't finished propagating, it's that we can't possibly know what the value of the TXT record should be. We'd have to manually create an Environment in advance and check what the verification token is for the subscription, then manually put that in as a configuration value in our Pulumi infrastructure-as-code. Azure is our DNS host, and we've had no issues with the validation once the record has been created. Indeed, it was working perfectly with our setup of create Environment -> create Container App with HTTPS and custom domain enabled -> create verification TXT record which references the Container App for the verification token, before the CApps team introduced this change.

BenjaminZ commented 1 year ago

the problem is not that the DNS record hadn't finished propagating, it's that we can't possibly know what the value of the TXT record should be. We'd have to manually create an Environment in advance and check what the verification token is for the subscription, then manually put that in as a configuration value in our Pulumi infrastructure-as-code.

exactly the same problem here. we need to create a dummy environment just to get the verification code to create a DNS record then delete it when creating a new subscription.

praneetloke commented 1 year ago

API management service has an ARM API that lets us get the verification ID, which is the same for the entire subscription. So a similar API at the subscription-level could resolve the circular dependency here. However, even with that API added one would still run into a problem adding a DNS suffix to a new environment because the CNAME validation would fail (because you can't add a CNAME without knowing the default URL for the environment you are about to create.) And that is something that would not be possible to solve unless the DNS suffix settings is separated from the environment resource and made into its own resource in the ARM API spec.

ghabre commented 1 year ago

i was checking the instructions here: https://learn.microsoft.com/en-us/azure/container-apps/environment-custom-dns-suffix#add-a-custom-dns-suffix-and-certificate

And basically on the dns server i added a TXT record for asuid.{subdomain} same way i added the cname record but with asuid. in the beginning, and type TXT. in the content of the txt record i added the hash that was appearing when i was getting the error related to TXT, which i believe is a fingerprint

Mortana89 commented 8 months ago

API management service has an ARM API that lets us get the verification ID, which is the same for the entire subscription. So a similar API at the subscription-level could resolve the circular dependency here. However, even with that API added one would still run into a problem adding a DNS suffix to a new environment because the CNAME validation would fail (because you can't add a CNAME without knowing the default URL for the environment you are about to create.) And that is something that would not be possible to solve unless the DNS suffix settings is separated from the environment resource and made into its own resource in the ARM API spec.

Don't want to hijack this discussion, but I've been searching a lot for this issue, within APIM, where I need the customVerificationId in order to setup the DNS from within bicep. You are talking about an 'own resource ', could you please point me in the right direction to see what you are talking about?

praneetloke commented 8 months ago

@Mortana89

You are talking about an 'own resource ', could you please point me in the right direction to see what you are talking about?

That was just a suggestion. It doesn't exist today for Container Apps.

but I've been searching a lot for this issue, within APIM, where I need the customVerificationId in order to setup the DNS from within bicep.

I am actually not clear what you are looking for? Are you trying to setup APIM to point to your Container Apps environment? If that's the case, why do you want a custom DNS suffix for your environment? Simply set the custom domain to point to your APIM instead.

Mortana89 commented 8 months ago

@Mortana89

You are talking about an 'own resource ', could you please point me in the right direction to see what you are talking about?

That was just a suggestion. It doesn't exist today for Container Apps.

but I've been searching a lot for this issue, within APIM, where I need the customVerificationId in order to setup the DNS from within bicep.

I am actually not clear what you are looking for? Are you trying to setup APIM to point to your Container Apps environment? If that's the case, why do you want a custom DNS suffix for your environment? Simply set the custom domain to point to your APIM instead.

No, I'm trying to set up a custom domain for APIM, just like it's possible for app service, with bicep. But to do this I need this verification ID which I can't find in the APIM bicep stuff!

praneetloke commented 8 months ago

@Mortana89 the REST API endpoint for retrieving the ownership ID in order to add a custom domain to APIM is https://learn.microsoft.com/en-us/rest/api/apimanagement/api-management-service/get-domain-ownership-identifier?view=rest-apimanagement-2022-08-01&tabs=HTTP. Note that the API version is 2022-08-01. I don't know how you would execute that request in Bicep though. Perhaps using a deploymentScript and run the equivalent PowerShell or az CLI command instead?

Interestingly, @bbi-willemtoorenburgh while looking for the endpoint for APIM I found that Azure seems to have added a GET endpoint to retrieve the custom domain verification ID for a subscription in the Container Apps namespace too, in August, which doesn't have a dependency on the container app environment, so maybe the circular dependency is gone now? I am not working Container Apps at the moment so I don't have access to try this out with an existing environment, but if you do, could you please give this a shot? The API version seems to be 2023-08-01-preview. https://learn.microsoft.com/en-us/rest/api/containerapps/get-custom-domain-verification-id/get-custom-domain-verification-id?view=rest-containerapps-2023-08-01-preview&tabs=HTTP

bbi-willemtoorenburgh commented 8 months ago

@praneetloke fascinating! Thanks for highlighting that; it indeed seems like it'd be an appropriate workaround, though I firmly remain of the mind that this issue should stay open as it shouldn't be the burden of the users to have to hunt down this endpoint themselves.

I am also not working on Container Apps anymore, as problems like this one has motivated us to migrate off of Container Apps to just running a container host ourselves on basic VMs. The service, and sadly, the Container Apps team, has proven to be too unpredictable and untrustworthy for our needs.