pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
126 stars 34 forks source link

CA Cert issues when running a pulumi refresh #3192

Closed mikhailshilkov closed 4 months ago

mikhailshilkov commented 6 months ago

What happened?

A customer is having an issue with one of their stacks. A pulumi refresh fails with the following error:

Diagnostics:
  azure-native:storage:Blob (dev-func-blob-dev-blob):
    error: Preview failed: retrieving blob properties "dev-func-blob-dev-blob" (container "dev-blob-container-dev-ctr" / account 
"devst"): blobs.Client#GetProperties: Failure sending request: StatusCode=0 -- Original Error: Head "https://devst.blob.core.windows.net/dev-blob-container-dev-ctr/dev-func-blob-dev-blob": tls: failed to verify certificate: x509: certificate is valid for vault.azure.net, *.vault.azure.net, *.vaultcore.azure.net, *.z1.vault.azure.net, *.z2.vault.azure.net, *.z3.vault.azure.net, *.z4.vault.azure.net, *.z5.vault.azure.net, *.z6.vault.azure.net, *.z7.vault.azure.net, *.z8.vault.azure.net, *.z9.vault.azure.net, *.z10.vault.azure.net, *.z11.vault.azure.net, *.z12.vault.azure.net, *.z13.vault.azure.net, *.z14.vault.azure.net, *.z15.vault.azure.net, *.z16.vault.azure.net, *.z17.vault.azure.net, *.z18.vault.azure.net, *.z19.vault.azure.net, *.z20.vault.azure.net, *.z21.vault.azure.net, *.z22.vault.azure.net, *.z23.vault.azure.net, *.z24.vault.azure.net, *.z25.vault.azure.net, *.z26.vault.azure.net, *.z27.vault.azure.net, *.z28.vault.azure.net, *.z29.vault.azure.net, *.z30.vault.azure.net, *.z31.vault.azure.net, *.z32.vault.azure.net, *.z33.vault.azure.net, *.z34.vault.azure.net, *.z35.vault.azure.net, *.z36.vault.azure.net, *.z37.vault.azure.net, *.z38.vault.azure.net, *.z39.vault.azure.net, *.z40.vault.azure.net, *.z41.vault.azure.net, *.z42.vault.azure.net, *.z43.vault.azure.net, *.z44.vault.azure.net, *.z45.vault.azure.net, *.z46.vault.azure.net, *.z47.vault.azure.net, *.z48.vault.azure.net, *.z49.vault.azure.net, *.z50.vault.azure.net, *.z51.vault.azure.net, *.z52.vault.azure.net, *.z53.vault.azure.net, *.z54.vault.azure.net, 
*.z55.vault.azure.net, *.z56.vault.azure.net, *.z57.vault.azure.net, *.z58.vault.azure.net, *.z59.vault.azure.net, *.z60.vault.azure.net, *.z61.vault.azure.net, *.z62.vault.azure.net, *.z63.vault.azure.net, *.z64.vault.azure.net, *.z65.vault.azure.net, *.z66.vault.azure.net, *.z67.vault.azure.net, *.z68.vault.azure.net, *.z69.vault.azure.net, *.z70.vault.azure.net, *.z71.vault.azure.net, *.z72.vault.azure.net, *.z73.vault.azure.net, *.z74.vault.azure.net, *.z75.vault.azure.net, *.z76.vault.azure.net, *.z77.vault.azure.net, *.z78.vault.azure.net, *.z79.vault.azure.net, *.z80.vault.azure.net, *.z81.vault.azure.net, *.z82.vault.azure.net, *.z83.vault.azure.net, *.z84.vault.azure.net, *.z85.vault.azure.net, *.z86.vault.azure.net, *.z87.vault.azure.net, *.z88.vault.azure.net, *.z89.vault.azure.net, *.z90.vault.azure.net, *.z91.vault.azure.net, *.z92.vault.azure.net, *.z93.vault.azure.net, *.z94.vault.azure.net, *.z95.vault.azure.net, *.z96.vault.azure.net, *.z97.vault.azure.net, *.z98.vault.azure.net, *.z99.vault.azure.net, not devst.blob.core.windows.net

  pulumi:pulumi:Stack (Infrastructure-dev):
    error: preview failed

This specific error is happening only on a pulumi refresh. It hangs for a long time during the refresh process (several minutes) before spitting out this error. Have confirmed that I can destroy and create using pulumi up and pulumi destroy with no issues.

In the detailed log, I see the following:

I0329 13:05:41.496889   21528 provider_plugin.go:1837] provider received rpc error `Unknown`: `retrieving blob properties "dev-func-blob-dev-blob" 
(container "dev-blob-container-dev-ctr" / account "devst"): blobs.Client#GetProperties: Failure sending request: StatusCode=0 -- Original Error: 
Head "https://devst.blob.core.windows.net/dev-blob-container-dev-ctr/dev-func-blob-dev-blob": tls: failed to verify certificate: x509: 
certificate is valid for *.table.core.windows.net, *.sjc20prdstr26a.store.core.windows.net, *.table.storage.azure.net, *.z1.table.storage.azure.net, 
*.z2.table.storage.azure.net, *.z3.table.storage.azure.net, *.z4.table.storage.azure.net, *.z5.table.storage.azure.net, *.z6.table.storage.azure.net, 
*.z7.table.storage.azure.net, *.z8.table.storage.azure.net, *.z9.table.storage.azure.net, *.z10.table.storage.azure.net, *.z11.table.storage.azure.net, 
*.z12.table.storage.azure.net, *.z13.table.storage.azure.net, *.z14.table.storage.azure.net, *.z15.table.storage.azure.net, *.z16.table.storage.azure.net, 
*.z17.table.storage.azure.net, *.z18.table.storage.azure.net, *.z19.table.storage.azure.net, *.z20.table.storage.azure.net, *.z21.table.storage.azure.net, 
*.z22.table.storage.azure.net, *.z23.table.storage.azure.net, *.z24.table.storage.azure.net, *.z25.table.storage.azure.net, *.z26.table.storage.azure.net, 
*.z27.table.storage.azure.net, *.z28.table.storage.azure.net, *.z29.table.storage.azure.net, *.z30.table.storage.azure.net, *.z31.table.storage.azure.net, 
*.z32.table.storage.azure.net, *.z33.table.storage.azure.net, *.z34.table.storage.azure.net, *.z35.table.storage.azure.net, *.z36.table.storage.azure.net, 
*.z37.table.storage.azure.net, *.z38.table.storage.azure.net, *.z39.table.storage.azure.net, *.z40.table.storage.azure.net, *.z41.table.storage.azure.net, 
*.z42.table.storage.azure.net, *.z43.table.storage.azure.net, *.z44.table.storage.azure.net, *.z45.table.storage.azure.net, *.z46.table.storage.azure.net, 
*.z47.table.storage.azure.net, *.z48.table.storage.azure.net, *.z49.table.storage.azure.net, *.z50.table.storage.azure.net, not 
devst.blob.core.windows.net`
I0329 13:05:41.496889   21528 provider_plugin.go:1841] rpc error kind `Unknown` may not be recoverable
I0329 13:05:41.497450   21528 provider_plugin.go:1095] Provider[azure-native, 0xc001bfc000].Read(/subscriptions/<guid>/resourceG
roups/dev-rg/providers/Microsoft.Storage/storageAccounts/devst/blobServices/default/containers/dev-blob-container-dev-ctr/blobs/dev
-func-blob-dev-blob,urn:pulumi:dev::Infrastructure::frazure:storage:Blob$azure-native:storage:Blob::dev-func-blob-dev-blob) failed: rpc 
error: code = Unknown desc = retrieving blob properties "dev-func-blob-dev-blob" (container "dev-blob-container-dev-ctr" / account 
"devst"): blobs.Client#GetProperties: Failure sending request: StatusCode=0 -- Original Error: Head 
"https://devst.blob.core.windows.net/dev-blob-container-dev-ctr/dev-func-blob-dev-blob": tls: failed to verify certificate: x509: 
certificate is valid for *.table.core.windows.net, *.sjc20prdstr26a.store.core.windows.net, *.table.storage.azure.net, *.z1.table.storage.azure.net, 
*.z2.table.storage.azure.net, *.z3.table.storage.azure.net, *.z4.table.storage.azure.net, *.z5.table.storage.azure.net, *.z6.table.storage.azure.net, 
*.z7.table.storage.azure.net, *.z8.table.storage.azure.net, *.z9.table.storage.azure.net, *.z10.table.storage.azure.net, *.z11.table.storage.azure.net, 
*.z12.table.storage.azure.net, *.z13.table.storage.azure.net, *.z14.table.storage.azure.net, *.z15.table.storage.azure.net, *.z16.table.storage.azure.net, 
*.z17.table.storage.azure.net, *.z18.table.storage.azure.net, *.z19.table.storage.azure.net, *.z20.table.storage.azure.net, *.z21.table.storage.azure.net, 
*.z22.table.storage.azure.net, *.z23.table.storage.azure.net, *.z24.table.storage.azure.net, *.z25.table.storage.azure.net, *.z26.table.storage.azure.net, 
*.z27.table.storage.azure.net, *.z28.table.storage.azure.net, *.z29.table.storage.azure.net, *.z30.table.storage.azure.net, *.z31.table.storage.azure.net, 
*.z32.table.storage.azure.net, *.z33.table.storage.azure.net, *.z34.table.storage.azure.net, *.z35.table.storage.azure.net, *.z36.table.storage.azure.net, 
*.z37.table.storage.azure.net, *.z38.table.storage.azure.net, *.z39.table.storage.azure.net, *.z40.table.storage.azure.net, *.z41.table.storage.azure.net, 
*.z42.table.storage.azure.net, *.z43.table.storage.azure.net, *.z44.table.storage.azure.net, *.z45.table.storage.azure.net, *.z46.table.storage.azure.net, 
*.z47.table.storage.azure.net, *.z48.table.storage.azure.net, *.z49.table.storage.azure.net, *.z50.table.storage.azure.net, not 
devst.blob.core.windows.net

where it indicates that it failed to verify the x509 certificate

Example

this.blob = new storage.Blob(
    `${name}-${envName}-blob`,
    {
        blobName: fileName,
        resourceGroupName: resourceGroup.name,
        accountName: storageAccountName,
        containerName: containerName,
        source: asset,
    },
    {
        parent: this,
        ...resourceOptions,
    }
);

Output of pulumi about

CLI Version 3.106.0 Go Version go1.22.0 Go Compiler gc Plugins NAME VERSION azure 5.49.0 azure-native 2.32.0 azuread 5.40.0 command 0.7.2 docker 3.6.1 nodejs unknown random 4.13.2 Host OS Microsoft Windows 10 Enterprise Version 10.0.19045 Build 19045 Arch x86_64 This project is written in nodejs: executable='C:\Program Files\nodejs\node.exe' version='v18.14.2'

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

danielrbradley commented 6 months ago

Here's some initial thoughts on contributing factors:

  1. Is this only impacting the single Blob resource? Do any other resources also hang in a similar way on refresh?
  2. The refresh operation differs from a normal preview or deployment because it does not run the user's program. Is there anything special happening in the user's program with regards to authentication or modifying the environment?
    • For example, if using explicit programatic provider configuration, this will just use the values stored in the Pulumi state during a refresh.
  3. When the Blob resource is created or updated, we call the exact same read() method which is used during the refresh. Therefore, we know in some context the read operation does work correctly. This leads me to think their is most likely a difference in:
    • The environment executing the pulumi command.
    • The initialization of the provider configuration from the user program.
    • The order of execution within the custom resource causing different internal state within the provider. I.e. there's not a create or update before the read, so running one of these operations first could in theory affect the state of the client when it's then used for the read operation.

cc: @alayshia @mikhailshilkov

mikhailshilkov commented 4 months ago

I think the conclusion was that this was a problem with the customer's network configuration