Closed simon-k closed 3 weeks ago
"No such host is known" isn't generally considered a transient error: this error indicates that the DNS hostname (e.g. foo.bar.com) could not be resolved to an IP address; retrying that DNS lookup right away generally will return the exact same results. In what scenario do you believe this is a transient error?
In general I would agree with you, but in our case the DNS resolution is a transient error - at lease according to the Azure Support team. Our client is running as an App Service on Azure and it is using the Azure provided DNS service to resolve the PGSQL database URL to an IP address - and in rare ocations the DNS Server cannot resolve the IP.
Azure suport
Now, the issue is indeed transient and doesn’t last. The retry policy you described previously is expected to take care of the issue.
So if I just accept that the DNS we are using will in 0.001% og the time fails to resolve, and that it is a transient error, then I would really like the EnableRetryOnFailure to handle the issue for me. So which Error code should I pass on to the method as the errorCodesToAdd
argument for it to let it retry?
So anyone knows what error code I should sepcify in this call to retry when I get the System.Net.Socket.SocketException 11001 exception?
optionsBuilder.UseNpgsql(connectionString, providerOptions =>
{
providerOptions.EnableRetryOnFailure(errorCodesToAdd: ???? );
});
Or could it be that the errorCodesToAdd in the NpgsqlRetryingExecutionStrategy only works on Postgres Exceptions and I therefore cannot rely on this to retry?
"No such host is known" isn't generally considered a transient error: this error indicates that the DNS hostname (e.g. foo.bar.com) could not be resolved to an IP address; retrying that DNS lookup right away generally will return the exact same results. In what scenario do you believe this is a transient error?
When I look at the implemention of NpgsqlException
it seems like socket exceptions are treated as transient exceptions. Or am I readign this code wrong?
public virtual bool IsTransient
=> InnerException is IOException or SocketException or TimeoutException or NpgsqlException { IsTransient: true };
@simon-k sorry for not answering for so long.
You're right that errorCodesToAdd
is only for additional PostgreSQL error codes - but here you're trying to add a non-PG error code (DNS resolution failure).
However, you should be able to easily extend NpgsqlRetryingExecutionStrategy, and then pass that to ExecutionStrategy()
inside UseNpgsql()
(instead of calling EnableRetryOnFailure()
, which just sets up NpgsqlRetryingExecutionStrategy). At that point you fully control what EF considers a transient error, and can decide to treat DNS issues as such.
Regardless, thanks for providing the communication with Azure support. I'll bring this up with the EF team to discuss whether it makes sense to treat DNS failures as transient by default.
We've discussed this in the EF team and we think we need more feedback - opened https://github.com/dotnet/efcore/issues/34950 on the EF repo to do that.
Hi I'm using npgsql in an Azure Function App, and I have some issues with EnableRetryOnFailure().
This is my code that is creating my EFCore DbContext and where I set the EnableRetryOnFailure option.
The retry logic simply does not seem to work for me. On rare occasions the connection to the DB fails with a System.Net.Socket.SocketException 11001 "No such host is known".
It should be a transient exception and therefore the retry logic should kick in, but it doesn't.
This blog How to do retries in EF Core describes the list of error codes that is consideres transient by EfCore, and I'm not sure my socket exception is included and the error codes does not seem to match this document with Npgsql error codes https://www.postgresql.org/docs/current/errcodes-appendix.html
What am I doing wrong here? Do I need to specify some explicit error codes in EnableRetryOnFailure(...) or is there something else I m missing?