microsoft / containerregistry

Microsoft Artifact Registry description and related FAQ
397 stars 89 forks source link

Unable to pull mcr.azureedge.net/dotnet/sdk:8.0 - tls: failed to verify certificate #162

Closed frnkeddy closed 4 months ago

frnkeddy commented 4 months ago

When I execute: docker pull mcr.microsoft.com/dotnet/sdk:8.0 I get the following error

Error response from daemon: Head "https://mcr.microsoft.com/v2/dotnet/sdk/manifests/8.0": tls: failed to verify certificate: x509: certificate is valid for .azureedge.net, .media.microsoftstream.com, .origin.mediaservices.windows.net, .streaming.mediaservices.windows.net, not mcr.microsoft.com

I've attempted the same for all of sdk:8.0 labels and some of the 7.0 and 6.0, labels, and I get the same result.

AndreHamilton-MSFT commented 4 months ago

@frnkeddy can you share what region you are in. Is this still ongoing? We are currently investigating some issues impacting azure jio india west. We are currently working on mitigation

frnkeddy commented 4 months ago

My systems are in the USA, western region (state of Utah)

On Tue, Apr 23, 2024 at 8:00 AM AndreHamilton-MSFT @.***> wrote:

@frnkeddy https://github.com/frnkeddy can you share what region you are in. Is this still ongoing? We are currently investigating some issues impacting azure jio india west. We are currently working on mitigation

— Reply to this email directly, view it on GitHub https://github.com/microsoft/containerregistry/issues/162#issuecomment-2072398123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXKU6M63PKPX47DO3S7FNCDY6ZSRPAVCNFSM6AAAAABGTZYEM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZSGM4TQMJSGM . You are receiving this because you were mentioned.Message ID: @.***>

AndreHamilton-MSFT commented 4 months ago

@frnkeddy Can you visit https://mcr.microsoft.com/ in your browser/curl and share the value of response header "X-Msedge-Ref". This will help us identify where this might be occurring

ksacry-ft commented 4 months ago

Same issue here, in Utah as well x-msedge-ref: Ref A: D80E11CB64704E54830D9CD438D06D9A Ref B: SLC31EDGE0207 Ref C: 2024-04-23T15:53:37Z

mattkruskamp commented 4 months ago

+1 To that issue. Also in Utah. X-MSEdge-Ref: Ref A: C1D956FCEDB247EFA0991CEFDEC104D5 Ref B: SLC31EDGE0218 Ref C: 2024-04-23T16:12:53Z

Same error when trying to docker pull mcr.microsoft.com/azure-storage/azurite:latest and mcr.microsoft.com/mssql/server:2022-latest

AndreHamilton-MSFT commented 4 months ago

Great. This is helpful. Will follow up once i know more

frnkeddy commented 4 months ago

I'm getting the same/similar curl response as the others using curl -i https://mcr.microsoft.com

X-MSEdge-Ref: Ref A: A9DCA542B0684B48A4E6838BE89C8983 Ref B: SLC31EDGE0117 Ref C: 2024-04-24T00:34:23Z

If it helps to I know the problem started between April 22, 2024 at 7:56:50 AM MDT and April 22, 2024 at 5:42:33 PM MDT as reported by a CI/CD pipeline I use.

Additional curl results include (note the difference in the URLs):

curl -i https://mcr.microsoft.com/dotnet

X-MSEdge-Ref: Ref A: 9382DD583636410EA6E4F10B557C3CA6 Ref B: SLC31EDGE0211 Ref C: 2024-04-24T00:47:06Z

curl -i https://mcr.microsoft.com/dotnet/sdk

curl: (60) schannel: SNI or certificate check failed: SEC_E_WRONG_PRINCIPAL (0x80090322) - The target principal name is incorrect.

frnkeddy commented 4 months ago

One more bit of insight: This docker pull attempt failed part way through: docker pull mcr.microsoft.com/dotnet/aspnet:8.0-jammy-amd64

8.0-jammy-amd64: Pulling from dotnet/aspnet e311a697a403: Retrying in 1 second 154eb062a695: Retrying in 1 second 81af5a508103: Retrying in 1 second b646c0a58c82: Waiting 1254901aed19: Waiting d16bfc8b4664: Waiting error pulling image configuration: download failed after attempts=6: tls: failed to verify certificate: x509: certificate is valid for .azureedge.net, .media.microsoftstream.com, .origin.mediaservices.windows.net, .streaming.mediaservices.windows.net, not westcentralus.data.mcr.microsoft.com

initrd commented 4 months ago

We are seeing very slow pulls in AP Southeast/Singapore and India regions. Some layers download quick, whereas others are stuck at a few kBs, some are really slow to download:

#4 [e2e-api stage-0 1/4] FROM mcr.microsoft.com/playwright:v1.40.1-jammy@sha256:1aba528f5db4f4c130653ed1de737ddc1d276197cc4503d3bb7903a93b7fb32e
...
#4 sha256:05f6649df41d0f3c197559f4473b47a6764d3f807d6c10145ab6bb01c722abcb 136.31MB / 555.83MB 3310.9s
stanhu commented 4 months ago

From a host in Google's us-east1-d, I'm seeing this issue:

$ docker pull mcr.microsoft.com/dotnet/sdk:6.0.400-1-focal
6.0.400-1-focal: Pulling from dotnet/sdk
675920708c8b: Pulling fs layer
63c1e812e3e8: Pulling fs layer
efc4bd123130: Pulling fs layer
459ef695deeb: Waiting
c774e78dcdb2: Waiting
9cc80820d7f5: Waiting
c3d985ec3b5b: Waiting
b3fa791bf5d1: Waiting

It looks like this edge is hitting ATL?

$ curl -s -i "https://mcr.microsoft.com/" | grep Edge
X-MSEdge-Ref: Ref A: 7B12140D720C4039A7A5A181F8E13D7A Ref B: ATL331000108037 Ref C: 2024-04-24T04:40:56Z

Docker debug logs show:

# journalctl -f -u docker.service
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.035922053Z" level=debug msg="Calling HEAD /_ping"
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.038468358Z" level=debug msg="Calling POST /v1.40/images/create?fromImage=mcr.microsoft.com%2Fdotnet%2Fsdk&tag=6.0.400-1-focal"
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.040890542Z" level=debug msg="hostDir: /etc/docker/certs.d/mcr.microsoft.com"
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.040997223Z" level=debug msg="Trying to pull mcr.microsoft.com/dotnet/sdk from https://mcr.microsoft.com v2"
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.251159340Z" level=debug msg="Pulling ref from V2 registry: mcr.microsoft.com/dotnet/sdk:6.0.400-1-focal"
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.251210871Z" level=debug msg="mcr.microsoft.com/dotnet/sdk:6.0.400-1-focal resolved to a manifestList object with 3 entries; looking for a unknown/amd64 match"
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.251236571Z" level=debug msg="found match for linux/amd64 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:0d329a3ebef503348f6c289ff72871c92a9cc4fbfa4f50663bd42ab56587b998"
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.367795041Z" level=debug msg="pulling blob \"sha256:675920708c8bf10fbd02693dc8f43ee7dbe0a99cdfd55e06e6f1a8b43fd08e3f\""
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.368064042Z" level=debug msg="pulling blob \"sha256:63c1e812e3e8c944ddbe6e9ed940d8cb71208d4f7a1d6555e8cd255a764b67a7\""
Apr 24 02:12:12 stanhu-test1 dockerd[1929]: time="2024-04-24T02:12:12.368224202Z" level=debug msg="pulling blob \"sha256:efc4bd1231305956bc5ff57e1eda1d3bbe5cdaedb98332020c6c20c6a1933c8a\""
Apr 24 02:15:35 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:35.866584258Z" level=error msg="Download failed, retrying: read tcp 10.10.240.17:41660->204.79.197.219:443: read: connection reset by peer"
Apr 24 02:15:35 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:35.866655378Z" level=error msg="Download failed, retrying: read tcp 10.10.240.17:41666->204.79.197.219:443: read: connection reset by peer"
Apr 24 02:15:36 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:36.750710558Z" level=error msg="Download failed, retrying: read tcp 10.10.240.17:41658->204.79.197.219:443: read: connection reset by peer"
Apr 24 02:15:40 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:40.866874643Z" level=debug msg="pulling blob \"sha256:efc4bd1231305956bc5ff57e1eda1d3bbe5cdaedb98332020c6c20c6a1933c8a\""
Apr 24 02:15:40 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:40.866958292Z" level=debug msg="attempting to resume download of \"sha256:efc4bd1231305956bc5ff57e1eda1d3bbe5cdaedb98332020c6c20c6a1933c8a\" from 64552 bytes"
Apr 24 02:15:40 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:40.866880192Z" level=debug msg="pulling blob \"sha256:675920708c8bf10fbd02693dc8f43ee7dbe0a99cdfd55e06e6f1a8b43fd08e3f\""
Apr 24 02:15:40 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:40.867387843Z" level=debug msg="attempting to resume download of \"sha256:675920708c8bf10fbd02693dc8f43ee7dbe0a99cdfd55e06e6f1a8b43fd08e3f\" from 64552 bytes"
Apr 24 02:15:41 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:41.751074262Z" level=debug msg="pulling blob \"sha256:63c1e812e3e8c944ddbe6e9ed940d8cb71208d4f7a1d6555e8cd255a764b67a7\""
Apr 24 02:15:41 stanhu-test1 dockerd[1929]: time="2024-04-24T02:15:41.751139092Z" level=debug msg="attempting to resume download of \"sha256:63c1e812e3e8c944ddbe6e9ed940d8cb71208d4f7a1d6555e8cd255a764b67a7\" from 64552 bytes"

However, with a Google Cloud VM in us-central1-f it works fine. The edge looks like CHI:

$ curl -s -i "https://mcr.microsoft.com/" | grep -i edge
x-msedge-ref: Ref A: FDD4986B6BE344BE87CDC4F04CB927CD Ref B: CHI30EDGE0206 Ref C: 2024-04-24T04:42:27Z
mathew-jithinm commented 4 months ago

Facing the same issue as above from last few hours while trying to pull mcr.microsoft.com/playwright/python:v1.41.0-jammy

malcuch commented 4 months ago

Same here, pulling is extremely slow, only a few kilobytes/s in my case for mcr.microsoft.com/dotnet/sdk:8.0. It downloads images for over 20 minutes and still less than 50% is completed. The same issue is on our cloud CI (GitLab), which confirms that it is not local network issue.

alexwilson1 commented 4 months ago

Same issue for me with mcr.microsoft.com/devcontainers/typescript-node in Washington (West US). Extremely slow.

x-msedge-ref: Ref A: 8336FC7B9F5542C3A96085A826C4F40F Ref B: STBEDGE0115 Ref C: 2024-04-24T08:38:31Z

If I try and load https://mcr.microsoft.com/ in Chrome I get: ERR_HTTP2_PROTOCOL_ERROR

And the page doesn't even load

Edit: Everything works when I use a VPN in Iceland

YevheniiSemenko commented 4 months ago

same here US, Virginia (locally) x-msedge-ref: Ref A: DDF6D4F14C74424EA0A5B5A75422F3D6 Ref B: WAW01EDGE0706 Ref C: 2024-04-24T09:43:10Z

and cloud environment (gcp, europe-west1-d) < x-msedge-ref: Ref A: B79AFFC6A0004F1385B8F1E264050AEF Ref B: LTSEDGE1520 Ref C: 2024-04-24T09:47:21Z

harrison-seow-develab commented 4 months ago

Private Runners (GitLab CI) hosted on ap-southeast-1 AWS seems to be affected as well

lai-vson commented 4 months ago

Gitlab hosted runner get affected as well.

AndreHamilton-MSFT commented 4 months ago

I'm getting the same/similar curl response as the others using curl -i https://mcr.microsoft.com

X-MSEdge-Ref: Ref A: A9DCA542B0684B48A4E6838BE89C8983 Ref B: SLC31EDGE0117 Ref C: 2024-04-24T00:34:23Z

If it helps to I know the problem started between April 22, 2024 at 7:56:50 AM MDT and April 22, 2024 at 5:42:33 PM MDT as reported by a CI/CD pipeline I use.

Additional curl results include (note the difference in the URLs):

curl -i https://mcr.microsoft.com/dotnet

X-MSEdge-Ref: Ref A: 9382DD583636410EA6E4F10B557C3CA6 Ref B: SLC31EDGE0211 Ref C: 2024-04-24T00:47:06Z

curl -i https://mcr.microsoft.com/dotnet/sdk

curl: (60) schannel: SNI or certificate check failed: SEC_E_WRONG_PRINCIPAL (0x80090322) - The target principal name is incorrect.

We are actively investigating this issue. will give a follow up in a bit

AndreHamilton-MSFT commented 4 months ago

We are seeing very slow pulls in AP Southeast/Singapore and India regions. Some layers download quick, whereas others are stuck at a few kBs, some are really slow to download:

#4 [e2e-api stage-0 1/4] FROM mcr.microsoft.com/playwright:v1.40.1-jammy@sha256:1aba528f5db4f4c130653ed1de737ddc1d276197cc4503d3bb7903a93b7fb32e
...
#4 sha256:05f6649df41d0f3c197559f4473b47a6764d3f807d6c10145ab6bb01c722abcb 136.31MB / 555.83MB 3310.9s

Is this still occurring. We had an issue jio india west related to slow downloads that was recently mitigated. If you are still seeing slow downloads can you provide a X-MSEdge-Ref for us. it would assist in narrowing down yourspecific issue

AndreHamilton-MSFT commented 4 months ago

Is anyone in utah still experiencing the invalid ceritifcates. Some mitigations were applied in that region. Please let me know if you are unblocked @frnkeddy @ksacry-ft @mattkruskamp . The other issues on slowness are probably unrelated. Will update once we isolate that specific issue

AndreHamilton-MSFT commented 4 months ago

@YevheniiSemenko @lai-vson @harrison-seow-develab @alexwilson1 @malcuch @mathew-jithinm @stanhu @initrd we have identified an issue that resulted in slow downloads in a number of regions and mitigations are being applied. Please let me know if you are continuing to see issues and if possible supply the ref header so we can identify any edges with issues remaining.

frnkeddy commented 4 months ago

I've confirmed that the issue has been resolved for me in Utah. All of my systems are now able to pull images successfully.

Thank you for working and fixing the issue.