microsoft / containerregistry

Microsoft Artifact Registry description and related FAQ
397 stars 89 forks source link

Slow download of the image from container registries #27

Closed Koubek closed 4 years ago

Koubek commented 4 years ago

Hello,

we are running hybrid (Win+Linux) Docker Swarm. We also span the whole cluster across multiple regions (today WEU and SAN). All nodes are VMs in Azure.

All primary infrastructure was established and has been operated in WEU. Mainly (and this is relevant in this case) the container registries ([ourregistry].azurecr.io).

The problem is since the last week (and not internal changes in docker infrastructure) we have been facing extremely slow docker pulls in SAN nodes (pulling from WEU in the case of our own images). Pulling from mcr.microsoft.com (e. g. pull mcr.microsoft.com/businesscentral/onprem) runs really fast, the same way as it was in the case of pulling from [ourregistry].azurecr.io.

We decided to enable geo-replicated container registries, everything was synced to SAN successfully but we don't see any performance impact. It looks like all layer pulls continue to use WEU registries.

Not sure what nslookup should return for [ourregistry].azurecr.io but it returns WEU resources as far as I can see (but maybe this isn't relevant).

Linux nodes in SAN we able to pull also from [ourregistry].azurecr.io our linux images fine without any impact. I haven't been testing yet to see if they pull from WEU or SAN (which should be now extremely fast).

I would be grateful for any information.

Koubek commented 4 years ago

Hello @SteveLasker, please, may I ask you for an advice on how to proceed with this issue? It's a big problem for us as our dev infrastructure highly depends on containers and we have problems basically with Microsoft Container Registry only (not with Docker Hub registry). Thank you in advance!!!

SteveLasker commented 4 years ago

Hi Koubek, sorry this is causing issues. mcr is backed with regional cdn, so it should always be fast. For acr, replicating to the local regions should resolve. Are you using private link, where you may have customized the ip and routing? if the troubleshooting docs (https://github.com/Azure/acr/blob/master/README.md#diagnostic--troubleshooting-links) don't help, i’d suggest opening a ticket to get the right folks directly engaged to help: https://aka.ms/acr/support/create-ticket

Koubek commented 4 years ago

@SteveLasker, thanks a lot for your response.

We replicate registries from WEU to SAN but there is no obvious advantage. The speed is still the same - very bad. Actually, before we were able to download images from WEU to SAN nodes quite quickly but then something happened and we don't know what (no changes on our side). Approx. 2 - 3 weeks back.

That's why we enabled geo-replication to SAN. But no improvements. I haven't tried Private Link, I expected CDN should resolve dynamically the best endpoint. Or am I wrong?

SteveLasker commented 4 years ago

Can you please open a ticket to drill in: https://aka.ms/acr/support/create-ticket

Koubek commented 4 years ago

OK, we will do it. I just wanted to use GitHub issues as MS states the company is more oriented now on open-sourced projects and uses also tools like GitHub etc. But if that link is the way I will discuss it with my colleagues and they will open the support ticket there.

Thanks.

SteveLasker commented 4 years ago

We are definitely focused on oss efforts, supporting our own and community driven projects. What you're describing should work as defined with regional routing. The questions you're raising are runtime instancing of specific Azure service which requires specific customer information to troubleshoot and diagnose the specific issue. Rather than ask you a bunch of detailed questions for your specific registry, support can take a look at where the requests are coming from, why traffic manager might not be routing properly, or other things I can't think of what to ask. They can also dedicate the time, rather than me slicing some time while in meetings :)

Koubek commented 4 years ago

Understand, thanks.

Koubek commented 4 years ago

Hi @SteveLasker, I have have been heading to register the ticket and decided to check again (I was doing it more than one week on a daily basis) and everything worked like charm. The pull was super fast.

nslookup against the registry URI was pointing now to the correct resources (SAN while when I was trying it before it was always pointing to WEU).

So I am not going to register the ticket now. I hope what did happen under the hood will remain efficient ;)

Anyway, thanks for your help and your time.

SteveLasker commented 4 years ago

glad it worked out. If you added the region after initial configuration, it may have cached the previous endpoint. Please let us know if you have any other troubles.