Open malthe opened 1 year ago
We've found the same behaviour - we have a container app that fetches secrets on startup, and we sometimes see that fail. Our process then gets started a few seconds later, and it all works. Overall this increases our startup time a lot, causing real issues for us.
We've found the same behaviour - we have a container app that fetches secrets on startup, and we sometimes see that fail. Our process then gets started a few seconds later, and it all works. Overall this increases our startup time a lot, causing real issues for us.
For some further insight on this, I added a crude timing loop as follows:
log.Info().Msg("Timing MSI startup - connecting to port 42356")
connectTimeouts := 0
startupTime := time.Now()
for {
d := net.Dialer{Timeout: time.Millisecond * 50}
conn, err := d.Dial("tcp", "localhost:42356")
if err != nil {
time.Sleep(time.Millisecond * 50)
connectTimeouts++
if connectTimeouts%20 == 0 {
log.Warn().Msgf("Timing MSI startup - still waiting - %d", connectTimeouts)
}
continue
}
log.Info().
Msgf("Timing MSI startup - done - %d - in %d ms", connectTimeouts, time.Now().Sub(startupTime).Milliseconds())
conn.Close()
break
}
Which resulted in the following log:
3/27/2023, 12:56:46.702 PM Timing MSI startup - connecting to port 42356
3/27/2023, 12:56:47.716 PM Timing MSI startup - still waiting - 20
3/27/2023, 12:56:48.728 PM Timing MSI startup - still waiting - 40
3/27/2023, 12:56:49.742 PM Timing MSI startup - still waiting - 60
3/27/2023, 12:56:50.755 PM Timing MSI startup - still waiting - 80
3/27/2023, 12:56:51.211 PM Timing MSI startup - done - 89 - in 4509 ms
So admittedly just one sample, but taking approximately 4.5 seconds to startup does not seem ideal.
Just a quick update, we are testing a fix for this that will wait for the managed identity endpoint to be ready before your containers are started, so that connection failures should be very rare even when you use it on startup.
Having the same issue.
Just started to use Azure Container Apps and connect Azure app config using Managed identity
Seeing this error internally. App works for some time and then stops and restarting doesn't help. Then after an hour starts working again.
Unhandled exception. Azure.Identity.AuthenticationFailedException: ManagedIdentityCredential authentication failed: Retry failed after 4 tries. Retry settings can be adjusted in ClientOptions.Retry or by configuring a custom retry policy in ClientOptions.RetryPolicy. (Connection refused (localhost:42356)) (Connection refused (localhost:42356)) (Connection refused (localhost:42356)) (Connection refused (localhost:42356))
A workaround using bash:
timeout 10s bash -c "until az login --identity 2>/dev/null; do sleep 1; done" || exit 1
I see the same error as mmigala in the exact same scenario. I have a .NET 7 minimal API app that connects to an Azure App Config when the app first starts, and I see a number of connection refused errors when the App Config provider tries to acquire a token through the MSI endpoint using a ManagedIdentityCredential. I have been seeing this occur more frequently over the past 2-3 weeks (I can't recall seeing this at all in the 6 months prior).
I found a reason why this was happening for me.
The problem was that minimum replicas count wasn't set to 1.
App was scaling to 0 and then it couldn't start again.
From docs:
Make sure you create a scale rule or set minReplicas to 1 or more if you don't enable ingress. If ingress is disabled and you don't define a minReplicas or a custom scale rule, then your container app will scale to zero and have no way of starting back up.
Hope this helps others.
We are effected of this aswell. Ingress enabled, and scale set to min 1 does not remedy the problems.😕
I noticed this happening and I also noticed that using a readiness probe doesn't fix the problem. I'm requesting an access token on each request (it's a PHP application) and intermittently the endpoint won't be accessible, or it will randomly return a 403.
@vturecek any news about that fix that was being tested one year ago?
@vturecek This happens multiple times every week. Was reported 1.5 years ago and a fix was being tested over a year ago. Please advise.
@mario-d-s, @waynebrantley - sorry for the delay. We made a couple updates to help with this, depending on the type of environment you're running on:
In a Workload Profile Consumption environment, we now support managed identity in init containers. By default, managed identity starts up during the init
phase of your application. Containers that run during the main phase should be able to access the local managed identity endpoint immediately because we wait for the init phase to complete before switching to main, however init containers may start before managed identity is available, and may need to perform retries.
In all other environments, we don't yet support managed identity for init containers. However, for those environments, we don't start your container until the local managed identity endpoint is available.
@waynebrantley are you seeing connection refused errors when your container starts? If so, in what phase (main or init) and what kind of environment are you on (Workload profile or consumption-only)?
@vturecek we are on a Dedicated Workload profile and do not use init containers. So if I understand correctly, from your response the following applies:
for those environments, we don't start your container until the local managed identity endpoint is available.
That is not what we're seeing. Multiple times a day we are getting errors from different containers that look like this:
Azure.Identity.AuthenticationFailedException: ManagedIdentityCredential authentication failed: Retry failed after 6 tries. Retry settings can be adjusted in ClientOptions.Retry or by configuring a custom retry policy in ClientOptions.RetryPolicy. (Connection refused (localhost:42356)) (Connection refused (localhost:42356)) (Connection refused (localhost:42356)) (Connection refused (localhost:42356)) (Connection refused (localhost:42356)) (Connection refused (localhost:42356))
This is tripping up our monitoring. We will look into increasing the threshold for container restarts at which we get notified but that is just a workaround, it should simply not be happening.
@vturecek sorry for the delayed reply. We are technically using a 'consumption' profile - but due to networking issues - the azure team has us on some kind of dedicated workload profile!
We are seeing those errors when the containers try and start in the main phase. We do not have init containers at this time.
This happens quite often.
These errors must have been fixed as we are not seeing them anymore.
This issue is a:
Issue description
Running a container app, we're seeing intermittent connectivity issues running
az login --identity
.That is, "[Errno 111] Connection refused", suggesting that the service has somehow not been brought up yet.
Steps to reproduce
Run a container app where the first action is to login using managed identity, e.g.
Observe intermittent connectivity issues.
Expected behavior
The MSI endpoint should be ready immediately.
Actual behavior
The MSI endpoint is not always available.