Outgoing connection to Azure SQL Server are closed on Consumption work profile

claria commented 2 weeks ago

This issue is a: (mark with an x)

[ ] bug report -> please search issues before submitting
[ ] documentation issue or request
[x] regression (a behavior that used to work and stopped in a new release)

Issue description

We are running an azure container app on the consumption work profile. (1vcpu/2GB Ram) and is running with 2 instances. The deployed docker container is a Keycloak v26 release. Keycloak connects to our Azure SQL database. The container keeps open a connection pool of 20 connections to the sql database per replica.

Since around 2-3 weeks, we suddenly see "Connection reset by peer" errors when these connections are tried to beused. If i understand correctly, something inbetween (or the Azure sql server itself) closed the connections.

Since we initially thought, that this was due to killing of idling connections, we enabled sql connection validation and configured it that every 10 seconds a "select 1" statement is sent for each connection to keep it alive and validate it. We see on the sql server side that it works and each connection is used every 10 seconds. However, the problem continues and we see the same amount of connection reset by peer errors.

Our current investigations are:

On Azure SQL side we are far away from any connection limits etc.
All zure applications connection to this Azure sql server behave normal.
This behaviour is not happending all the time. Sometimes, we do not see any dropped connections for a couple of hours, then it starts again.
We deploy the exact same docker container/configuration from an azure VM. The problem does not occur. Everything is stable.
We deploy the exact same configuration but with a dedicated work profile. The problem does not occur. Everything is stable

Steps to reproduce

Create Container app with consupmtion work profile
Deploy Keycloak container

Expected behavior [What you expected to happen.]

Connections to sql server are not closed.

Actual behavior [What actually happened.]

The connection to the sql server remains open (if it is active)

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context

Ex. Did this issue occur in the CLI or the Portal?

JennyLawrance commented 2 weeks ago

Hi @claria , can you send us the ARM resource URI to acasupport(at)microsoft(dot)com? We will investigate.

claria commented 2 weeks ago

I sent the resource uri via mail.

JennyLawrance commented 2 weeks ago

[like] Jenny Lawrance reacted to your message:

From: Georg Sieber @.> Sent: Tuesday, November 5, 2024 10:09:44 AM To: microsoft/azure-container-apps @.> Cc: Comment @.***> Subject: Re: [microsoft/azure-container-apps] Outgoing connection to Azure SQL Server are closed on Consumption work profile (Issue #1338)

I sent the resource uri via mail.

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/azure-container-apps/issues/1338#issuecomment-2456757002 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFLXSMPIMPUA7TCN3CGTBOLZ7CKORBFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVI2DAMRYHA3DMMRRGOSG4YLNMWUWQYLTL5WGCYTFNSWHG5LCNJSWG5C7OR4XAZNMJFZXG5LFINXW23LFNZ2KM5DPOBUWG44TQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJUGE4TGOJYHE4TJAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDENRTGIZDSMRZGAZIFJDUPFYGLJLMMFRGK3FFOZQWY5LFVI2DAMRYHA3DMMRRGOTXI4TJM5TWK4VGMNZGKYLUMU. You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

claria commented 2 weeks ago

This is a count of all the "connection reset by peer" errors we see. We have two replica of the container app running. As you can see sometimes the error do not occure for over an hour for one replica while still happening for the other:

Logs are written to a log analytics workspace. Please have a look by yourself if you want.

claria commented 1 week ago

Did you find anything in the instance investigation?

claria commented 1 week ago

Any update @JennyLawrance ?

JennyLawrance commented 1 week ago

Yes, We applied a fix on your environment last week. Let us know if you see improvement in the connection status.

claria commented 1 week ago

Hi @JennyLawrance,

no, unfortunately the situation did not improve. These are the experienced connection error counts over the last week which have not really changed:

JennyLawrance commented 6 days ago

Hi @claria , I was able to confirm that we had an internal crossing of wires, and the fix wasn't applied to this environment as I previously stated. I apologize for the miscommunication here. We applied it today again (and I confirmed that at my side.) Can you check the status on your end and let us know what you see in your logs?

Thanks, Jenny

claria commented 5 days ago

Hi @JennyLawrance,

i can confirm that the errors are completely gone after your fix. What did you change and how can we prevent the errors to happen in the future?

We plan to create multiple similar environments. Do we have to expect the same error to happen there?

microsoft / azure-container-apps