microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
362 stars 29 forks source link

Running out of outbound TCP connections #1042

Open jgn-epp opened 9 months ago

jgn-epp commented 9 months ago

This issue is a: (mark with an x)

Issue description

We are running a load test of a system, which includes spinning up a lot of MQTT clients across a number of containers (in the same container app environment). After a while we start being unable to create new connections and have seemingly run out of outbound tcp connections. Sometimes we run out of connections at around 3k and sometimes 4k active connections. During out last test, we had 25 containers and about 10 of them doing work at a time. Every run of the test, it is random which containers pick up the work, which I guess was why the tcp socket exhaustion was happening at different limits, since I guess these containers are maybe not placed in the same node in the cluster that is running our containers. Since this is not transparent to us, this is my best guess.

Looking over the documentation, I cannot find any information related to this issue and restrictions on outbound tcp connections and how to deal with this issue.

Steps to reproduce

  1. Start creating thousands of MQTT connections from multiple containers to an external MQTT broker.

Expected behavior [What you expected to happen.] We can keep creating MQTT connections up to system limit(?).

Actual behavior [What actually happened.] We run out of connections and start getting socket timeouts when reaching 3k-4k connections (depending on how the clients are distributed across the containers).

howang-ms commented 9 months ago

Hi @jgn-epp, we are working on the documentation improvement. And in the meantime, can you share the FQDN of your container app environments with us, so we can see how we can best help you? Please send your information to acasupport at Microsoft. dot com

rakesh-308 commented 9 months ago

Hi @howang-ms dont know how your answer is related to Jeppe's question. Like Jeppe said you can replicate yourself the environment. I dont know if there is a way we can share fqdn as "Application URL" in overview tab of containe app says it is disabled for ingress. So I assume there is no fqdn as we dont need this app to have any inbound connection

ahelland commented 8 months ago

Azure Advisor served up the recommendation of using a NAT Gateway for scaling outbound access in my sub. Could that be the case with MQTT connections as well?

https://learn.microsoft.com/en-us/azure/nat-gateway/tutorial-migrate-outbound-nat

rakesh-308 commented 8 months ago

We are facing issues creating container app environment on custom network. There is also a open bug on this.

https://github.com/microsoft/azure-container-apps/issues/451

Do we have known outbound connection limit on container apps or container app environment, in between?

kylefossum commented 7 months ago

@ahelland This detailed dive on SNAT ports for App Services might be an interesting read for you. https://4lowtherabbit.github.io/blogs/2019/10/SNAT/

Even if you're on a private network, unless the MQTT endpoints are also deployed to that same private network then Azure is providing a network appliance somewhere that's doing NAT for you. NAT Gateway is the solution as it gives you a dedicated pool of IPs for building out the NAT table (which yields hugely more possible unique combinations for the NAT table) rather than having to share the public outbound IP with anything else on whatever the equivalent of an App Service Plan stamp is for Azure Container Apps.

ahelland commented 7 months ago

@kylefossum I was aware of SNAT as the underlying cause for Advisor making recommendations. But that link was a good article explaining the how and why in detail.

knepe commented 2 months ago

Is there any documentation on how many TCP connections an Azure Container App instance supports? From the issue it feels like it supports around 350 per instance (the default for Basic azure web app instance?) which is pretty bad. I'm trying to use SignalR with Service Bus as the "backplane". It works great and scales good in test environment but I don't want to put it in production if the limit is 350 connections per instance. Then I would probably be better off using Azure SignalR Service