rmbolger / Posh-ACME

PowerShell module and ACME client to create certificates from Let's Encrypt (or other ACME CA)
https://poshac.me/docs/latest/
MIT License
747 stars 184 forks source link

Azure IMDS authentication doesn't work on Arc-enabled servers #562

Closed timmgreen closed 5 days ago

timmgreen commented 3 weeks ago

Hello,

Working through the tutorial using the Azure plugin with AZuseIMDS.

Apologies if I'm missing something that is documented.

I'm getting error: "The api-version not specified. Supported are 2021-02-01 2020-06-01 2019-11-01 2019-08-15"

image

rmbolger commented 3 weeks ago

Sorry @timmgreen, this is not on you. Apparently Azure stopped supporting the explicit API version I had originally written the plugin against. I don't think it has changed drastically such that I will probably be able to just update the version to the latest supported one. But I'll need to revisit the docs to make sure and test stuff.

rmbolger commented 3 weeks ago

Ok, weird. I started doing a bit of testing in my Azure tenant tonight. Brand new linux VM with a managed service identity that has the DNS TXT Contributor role from the plugin guide assigned on the resource group that contains my test DNS zone. And everything seems to be working for me using the following $pArgs definition:

$pArgs = @{AZSubscriptionId='<my sub id>'; AZUseIMDS=$true}

That's not to say the plugin couldn't use a refresh on the API versions. But there may be something else going on here that is not as obvious as the error message would imply. Unless different tenants have different API version support maybe? Out of curiosity, do your DNS zones live under a different subscription than the VM? If so, the AZSubscriptionId should be the one where the DNS zones live.

Can you try running to following which will just attempt to write a test record to your zone with debug logging enabled and then post the output here? Hoping to get a better sense of where exactly the error is being thrown.

# replace example.com with your actual zone name
$DebugPreference = 'Continue'
Publish-Challenge example.com (Get-PAAccount) faketoken Azure $pArgs -Verbose

If it happens to magically work, you can run the same command except Unpublish-Challenge to get rid of the test record.

timmgreen commented 3 weeks ago

Ok, weird. I started doing a bit of testing in my Azure tenant tonight. Brand new linux VM with a managed service identity that has the DNS TXT Contributor role from the plugin guide assigned on the resource group that contains my test DNS zone. And everything seems to be working for me using the following $pArgs definition:

$pArgs = @{AZSubscriptionId='<my sub id>'; AZUseIMDS=$true}

That's not to say the plugin couldn't use a refresh on the API versions. But there may be something else going on here that is not as obvious as the error message would imply. Unless different tenants have different API version support maybe? Out of curiosity, do your DNS zones live under a different subscription than the VM? If so, the AZSubscriptionId should be the one where the DNS zones live.

Can you try running to following which will just attempt to write a test record to your zone with debug logging enabled and then post the output here? Hoping to get a better sense of where exactly the error is being thrown.

# replace example.com with your actual zone name
$DebugPreference = 'Continue'
Publish-Challenge example.com (Get-PAAccount) faketoken Azure $pArgs -Verbose

If it happens to magically work, you can run the same command except Unpublish-Challenge to get rid of the test record.

That failed as well. The thing I did differently is that I granted the 'DNS TXT Contributor' at the zone level instead of resource group. This is because in my resource group I have many zones and I wanted to do least privilege. This is also a Arc managed server that is on-premises but it does have a MI in Azure..

I notice that you said 'replace with zone name'. I am trying to issue a cert for a host in a subdomain, so would 'example.com' be replaced with for example 'server.subdomain.example.com' or just 'subdomain.example.com'? Maybe I misunderstood.

timmgreen commented 3 weeks ago

I found this: https://www.thomasmaurer.ch/2022/10/use-the-azure-arc-managed-identity-with-azure-powershell/

I can use 'Connect-AzAccount -Identity' and get a token, but from that article it appears that to get an access token via REST for an Arc-enabled server you have to specify the api-version.

timmgreen commented 3 weeks ago

The sample script from that article does get me a token

image

rmbolger commented 3 weeks ago

I notice that you said 'replace with zone name'. I am trying to issue a cert for a host in a subdomain, so would 'example.com' be replaced with for example 'server.subdomain.example.com' or just 'subdomain.example.com'? Maybe I misunderstood.

Any subdomain or the root domain would work. It will ultimately try to create a TXT record for _acme-challenge.<whatever>. But that record won't actually be used for anything.

The Arc-enabled server might be relevant. I know there were oddities someone previously ran into with IMDS and Azure Automation accounts that needed a code change to accommodate. So there may yet be another variation on how IMDS works from Arc-enabled servers. I'll take a look at that article and see if anything jumps out.

rmbolger commented 3 weeks ago

Hah, yep. Arc-enabled servers ultimately end up using the code path that was originally added for Azure Automation with the IDENTITY_ENDPOINT environment variable. The difference is that the documentation for IMDS with Azure Automation doesn't mention needing an api-version parameter in the URL so I didn't add it. https://learn.microsoft.com/en-us/azure/automation/enable-managed-identity-for-automation#get-access-token-for-system-assigned-managed-identity-using-http-get

Adding it is easy, but I don't want to potentially break existing Azure Automation IMDS users, so I'll need some time to get that setup and test it before I can actually put the fix in a release. I committed the fix to the main branch though if you want to grab the updated version of the plugin and just use that for yourself for the time being. https://github.com/rmbolger/Posh-ACME/blob/main/Posh-ACME/Plugins/Azure.ps1

timmgreen commented 3 weeks ago

I pulled that and got the below

image

I found the line and updated the api version to the latest supported version '2021-02-01'. Then I got this

image

I'm noticing that in those error messages it appears to be redirecting to 'https://management.core.windows.net' and in the sample script that actually got me a token it was using 'https://management.azure.com/'. That could be a core difference in how Arc-enabled and Azure native machines are authenticated with MSI.

rmbolger commented 3 weeks ago

Good eye. I missed that difference in the article. That management.azure.com URL appears to be the "Resource Manager" endpoint for public cloud as opposed to the "Management" URL the plugin is currently using. Apparently I need to review all these docs and make sure I'm using the correct resource everywhere.

Short term if you want to try tweaking your own copy of the plugin, you can modify line 476 and replace .ManagementUrl with .ResourceManagerUrl and we might be in business.

timmgreen commented 3 weeks ago

I was able to get it working. In Line 497 (requesting the token) it was missing the authorization header. I added some code adopted from that sample script. This works for me on Arc-enabled but not sure if this would break native Azure. I suppose you could check for existence of the Azure Connected Machine agent to see if it is Arc-enabled or just put it on the user of the module to specify a switch like IsArcEnabled as they would be aware whether it is or not.

            # use the default/VM metadata endpoint
            Write-Debug "Using default/VM metadata endpoint"
            $metadataUri = 'http://169.254.169.254/metadata/identity/oauth2/token'
        }
        #region new
        try {
            $secretFile = ""
            try
                {
                    Invoke-WebRequest -Method GET -Uri $metadataUri -Headers @{Metadata='True'} -UseBasicParsing
                }
            catch
                {
                    $wwwAuthHeader = $_.Exception.Response.Headers["WWW-Authenticate"]
                    if ($wwwAuthHeader -match "Basic realm=.+"){
                        $secretFile = ($wwwAuthHeader -split "Basic realm=")[1]
                    }
                }
            $secret = cat -Raw $secretFile
            $headers.Authorization = "Basic $secret"
        #endregion new    
            $tokResponse = Invoke-RestMethod $metadataUri -Body $body -Headers $headers @script:UseBasic -EA Stop
        } catch { throw }
timmgreen commented 3 weeks ago

The Azure Connected Machine agent runs as this ""C:\Program Files\AzureConnectedMachineAgent\himds.exe" and also includes a CLI tool azcmagent which has commands like 'azcmagent show' or 'azcmagent version'

rmbolger commented 3 weeks ago

The prompt for Basic auth using a key value from a local file is definitely unique to Arc as far as I know. I found the official docs for the process as well which has a verbatim copy of the example from the blog post in it. https://learn.microsoft.com/en-us/azure/azure-arc/servers/managed-identity-authentication#acquiring-an-access-token-using-rest-api

I spun up an Arc-enabled machine to test with and now have a working version on it. Still need to go back and verify IMDS from an Azure VM still works as well as Azure Automation. Really hoping there's an api-version value that works for both Arc and Azure VMs and that Azure Automation will just ignore the value if it doesn't need it.

timmgreen commented 3 weeks ago

I tried the new Azure.ps1 but wasn't working for me I had to make some edits. This is working for me.

try {
            $tokResponse = Invoke-RestMethod $metadataUri -Body $body -Headers $headers @script:UseBasic -EA Stop
        } catch {
            # Arc-enabled servers will send a 401 response prompting to retry with Basic auth using the contents
            # of a local file specified in the WWW-Authenticate header.
            if ('Unauthorized' -eq $_.Exception.Response.StatusCode -and ($authHeader = $_.Exception.Response.Headers.WWWAuthenticate.Parameter)) {
                # parse the file name and get the contents
                Write-Debug "WWW-Authenticate header: $($_.Exception.Response.Headers.ToString())"
                $keyFile = $authHeader.Split('=')[1]
                $key = Get-Content $keyFile -Raw
                # re-try with the requested Basic auth
                try {
                    Write-Debug "Retrying with Basic auth using key with length $($key.Length)"
                    $headers.Authorization = "Basic $key"
                    $tokResponse = Invoke-RestMethod $metadataUri -Body $body -Headers $headers @script:UseBasic -EA Stop
                } catch {
                    throw
                }
            } else {
                throw
            }
        }

$_.Exception.Response.StatusCode returns 'Unauthorized' for me instead of the actual 401 code.

$($_.Exception.Response.Headers.ToString()) returns image

$_.Exception.Response.Headers.WWWAuthenticate.Parameter returns image

rmbolger commented 3 weeks ago

The StatusCode property is an enum and the comparison should work with either the enum value name or actual integer value. I generally try to use the integer value to avoid any potential localization issues. But I don't think that was the issue here.

I have a feeling this is due to differences in the .NET type of the response being different between PowerShell 5.1 and 7+ for grabbing the header value. I had only been testing against 5.1 but the old code def breaks on 7. I recall needing to jump through hoops elsewhere in the code to get Header values consistently between editions. So I'll clearly have to do that here as well.

I hadn't commented after the last commit because I had a feeling I wasn't done yet but ran out of time that night.

rmbolger commented 3 weeks ago

Ok, give this latest commit a try. My test Arc box now works with both PowerShell 5.1 and 7. I still need to go back and re-test the Azure native VM and Azure Automation before I can put this in a release though.

rmbolger commented 2 weeks ago

Azure VM and Azure Automation both work via IMDS with the latest changes. Annoyingly, I realized the credential and certificate auth flows are using a now deprecated v1 token endpoint. So trying to decide whether to push a bugfix release just with the IMDS changes or take more time and migrate to the v2 token endpoint which will likely involve refactoring that whole auth function just to clean things up a bit.

Leaning towards the bugfix release for now and working on the auth refactor later. I didn't find any obvious end-of-life dates for the v1 token endpoint.

rmbolger commented 5 days ago

The fix is now live in 4.25.1