Closed indy-singh closed 2 months ago
Please could you clarify what this means?
We tried turning off (including a reboot) EnableAltSvc but the leak still occurred. Then we tried the same with EnableHttp3 but no joy. This was because naively we thought http3 == quic.
Is it possible that what you were changing was enabling/disabling QUIC for your web app, rather than for the client? It's relatively fiddly to turn off HTTP/3 at the client level at the moment - we have a workaround you could try that's documented here: https://github.com/googleapis/google-cloud-dotnet/blob/main/issues/Issue12318/Workaround/Program.cs
(Obviously we'd like to make this simpler over time.)
It would also be useful if you could provide a minimal example of what you're doing with the client, ideally as a console app - including whether you're disposing of everything. (It's over a year since the previous issue, so my guess is that your code may have changed.)
Please could you clarify what this means?
Yes by turning off HTTP3 I meant reversing the instructions here. So disabling http3 on OS level.
It's relatively fiddly to turn off HTTP/3 at the client level at the moment - we have a workaround you could try that's documented here: https://github.com/googleapis/google-cloud-dotnet/blob/main/issues/Issue12318/Workaround/Program.cs
Ah, that is nice. I did have a play around with Options API but it was not immediately apparent that it was done like that. I was thinking I would have the pass the delegating handler in. I'm happy to try that.
It would also be useful if you could provide a minimal example of what you're doing with the client, ideally as a console app - including whether you're disposing of everything. (It's over a year since the previous issue, so my guess is that your code may have changed.)
The code hasn't changed since the previous issue. But happy to provide a minimal example. This may take a while.
Thanks, Indy
Yes by turning off HTTP3 I meant reversing the instructions here. So disabling http3 on OS level.
That post is all about acting as an HTTP/3 server. I don't know for sure, but I'd strongly suspect that it has no effect on clients.
Ah, that is nice. I did have a play around with Options API but it was not immediately apparent that it was done like that. I was thinking I would have the pass the delegating handler in. I'm happy to try that.
Great, thank you.
The code hasn't changed since the previous issue. But happy to provide a minimal example. This may take a while.
No hurry - but it's much appreciated :)
That post is all about acting as an HTTP/3 server. I don't know for sure, but I'd strongly suspect that it has no effect on clients
Ah, sorry. I should be clearer about our setup. We have server-to-server communication. Thank you for prompting me to clarifying that point. I had missed that entirely, apologies.
We have server-to-server communication.
Yes, but the BigQueryStorageClient
is still acting as a client, therefore any settings which are about serving HTTP/3 are probably irrelevant to whether the client uses HTTP/3.
Yeah, I was just pointing out that I had misunderstood what registry keys were actually doing 👍🏽
Okay. I've cloned your code, and will try to reproduce the issue shortly. If you've got sample output from your repro which looks like it's leaking, that would be great for the sake of comparisons.
Could you let me know where the QuicException
comes, or post the fuller stack trace? I wonder whether that's coming before you start any using
statements. I note you're using quite an old version of Google.Cloud.BigQuery.Storage.V1
. If you update to 3.15.0, you can use a using
statement for appendRowsStream
:
using var appendRowsStream = _bigQueryWriteClientBuilder.AppendRows(CallSettings.FromCancellationToken(cancellationTokenSource.Token));
That way the gRPC call will be disposed even you don't get as far as completing the write stream.
If you've got sample output from your repro which looks like it's leaking, that would be great for the sake of comparisons.
I've not had a chance to let it run for long enough to match what is happening in production. But I can try.
It's relatively fiddly to turn off HTTP/3 at the client level at the moment - we have a workaround you could try that's documented here: https://github.com/googleapis/google-cloud-dotnet/blob/main/issues/Issue12318/Workaround/Program.cs
ExactVersionHandler
also solves the problem. I'm not sure which solution/workaround is better..?
Could you let me know where the QuicException comes, or post the fuller stack trace?
That was the full stack trace as emitted from FirstChanceException. I'll see what else I can dig out.
If you update to 3.15.0, you can use a using statement for
appendRowsStream
Will do :)
Cheers, Indy
ExactVersionHandler
also solves the problem
Excellent! I would personally go that route. While using Grpc.Core is still supported, it's definitely legacy - it'll be years before we are able to stop supporting it completely, but that's the direction of travel. In time I absolutely expect that we'll make it easier to modify the Grpc.Net.Client handler (ideally making it simple to control keepalive, specify the HTTP version etc) to avoid these problems - but for now it sounds like the workaround is the way to go.
I'm going to let ExactVersionHandler
stew for 24 hours before I'm happy the slow memory leak has been resolved. As it is a slow one, time is the test - so far all good.
Am I correct in thinking that ExactVersionHandler
forces to HTTP2 and therefore skips the QUIC path entirely?
Am I correct in thinking that ExactVersionHandler forces to HTTP2 and therefore skips the QUIC path entirely?
Yes. (I believe the request contains HTTP/2.0 by default, and this basically says "and don't upgrade it".)
Great. Well I'm totally out of my depth with Quic, but I'm having a play with https://github.com/microsoft/msquic/blob/main/docs/Diagnostics.md / https://github.com/microsoft/msquic/blob/main/docs/TSG.md#trouble-shooting-a-performance-issue and see if anything sticks out.
Hmmmm, I wonder if this is related?
https://github.com/dotnet/runtime/issues/80111
If I hook into
TaskScheduler.UnobservedTaskException += (sender, args) =>
{
Log.Error(new LogDetails("UnobservedTaskException", exception: args.Exception, objectDump: $"Was observed {args.Observed}"));
};
I definitely get some QUIC stuff:-
System.AggregateException: A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. (The connection timed out from inactivity.)
---> System.Net.Quic.QuicException: The connection timed out from inactivity.
at System.Net.Quic.ResettableValueTaskSource.TryComplete(Exception exception, Boolean final)
at System.Net.Quic.QuicStream.HandleEventShutdownComplete(_SHUTDOWN_COMPLETE_e__Struct& data)
at System.Net.Quic.QuicStream.HandleStreamEvent(QUIC_STREAM_EVENT& streamEvent)
at System.Net.Quic.QuicStream.NativeCallback(QUIC_HANDLE* connection, Void* context, QUIC_STREAM_EVENT* streamEvent)
--- End of stack trace from previous location ---
at System.Net.Quic.ResettableValueTaskSource.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at System.Net.Quic.QuicStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken)
at System.Net.Http.Http3Connection.<>c__DisplayClass41_0.<<ProcessServerControlStreamAsync>g__ReadFrameEnvelopeAsync|0>d.MoveNext()
--- End of stack trace from previous location ---
at System.Net.Http.Http3Connection.ProcessServerControlStreamAsync(QuicStream stream, ArrayBuffer buffer)
at System.Net.Http.Http3Connection.ProcessServerStreamAsync(QuicStream stream)
at System.Net.Http.Http3Connection.ProcessServerStreamAsync(QuicStream stream)
--- End of inner exception stack trace ---
Hmm... that's deeper than I normally go, to be honest - but I will say we've seen a few issues with QUIC recently. All the more reason to disable it in the handler :)
I'm going to let ExactVersionHandler stew for 24 hours before I'm happy the slow memory leak has been resolved. As it is a slow one, time is the test - so far all good.
Showed no signs of a slow leak over night. I'm happy for this issue to be closed if you want?
Cheers, Indy
Will do - thanks so much for testing things so quickly. We'll keep an eye on the whole QUIC situation... it sounds like we may well want to work out how to disable it by default. (Unfortunately, that's easier said than done, as the "right" HttpHandler to use is a fairly involved decision.)
Tangentially Related Context: https://github.com/googleapis/google-cloud-dotnet/issues/10034
Headline
I'll start with the fix/workaround:-
Switching from the default
GrpcAdapter
toGrpcCoreAdapter.Instance
as per this comment from a different issue solved the slow memory leak.Longer explanation
We are in the process of upgrading all of production servers (we are on-prem) from Server 2019 to Server 2022. We've rolled out 2022 with http3 enabled as per this post.
We started to notice a new slow leak on the newly upgraded 2022 boxes:-
You'll notice that the grey line remains flat because that box was still on 2019.
We hooked into
AppDomain.CurrentDomain.FirstChanceException
and saw that 2022 was throwing a lot of QUIC errors:-We tried turning off (including a reboot)
EnableAltSvc
but the leak still occurred. Then we tried the same withEnableHttp3
but no joy. This was because naively we thought http3 == quic.Eventually, I remembered from this comment that there was a suspicion of the default GrpcAdapater was the cause of that issue. So we gave that a shot:-
And the slow memory leak was fixed (deployed just before the weekend) hence the above graph flatlines at the start of the weekend.
The leak state took a little time to reproduce; eventually I switched to using an EventListener to watch for any QUIC activity because my [broad] theory was basically QUIC was somehow causing the memory leak.
I stopped digging once the problem was resolved; my best guess is that somehow those QUIC errors aren't disposing of resources and just slowly leaks. The actual application tier code hasn't changed at all since I opened the previous issue.
Happy to provide more details/logs/diagnostics.
Environment details
dotnet-hosting-8.0.6-win.exe
Package name and version: