microsoftgraph / msgraph-sdk-dotnet

Microsoft Graph Client Library for .NET!
https://graph.microsoft.com
Other
688 stars 246 forks source link

Getting a Message will sometimes cause a stackoverflow. #2588

Closed grippstick closed 1 week ago

grippstick commented 1 month ago

Describe the bug

Repeated calls to Me.Messages.GetAsync will intermittently fail with a stack over flow exception. It appears to be stemming from a recursive call in ParseNodeFactoryRegistry and ParseNodeProxyFactory. This was determined from a dump in procdump.

Expected behavior

There should not be intermittent stackoverflowexception.

How to reproduce

var tasks = Enumerable.Range(0, 1000).Select(async (index) => { var graphClient = await GetGraphClient(refreshToken).ConfigureAwait(false); return await graphClient.Me.Messages[msgID] .GetAsync(config => { config.QueryParameters.Select = new[] { "internetMessageId","internetMessageHeaders","createdDateTime","subject","bodyPreview", "hasAttachments","uniqueBody","body","from","toRecipients", "ccRecipients","bccRecipients","isDraft" }; }).ConfigureAwait(false); }); await System.Threading.Tasks.Task.WhenAll(tasks).ConfigureAwait(false);

SDK Version

5.56.0

Latest version known to work for scenario above?

No response

Known Workarounds

No response

Debug output

Click to expand log ``` ```

Configuration

Windows Server 2022 64 // EC2

Other information

I have a memory dump that can be provided on request. It has an access token and some PII in the symbols so not sharing publicly. image

andrueastman commented 1 month ago

Thanks for raising this @grippstick

This will require some investigation here. Any chance you are able to share more details from the stacktrace on the error information like the callsite?

From a pattern perspective, is there a reason not to make the multiple requests using batching as outlined at https://learn.microsoft.com/en-us/graph/sdks/batch-requests?tabs=csharp#simple-batching-example?

grippstick commented 1 month ago

The call site is MessageItemRequestBuilder.GetAsync: public async Task<Microsoft.Graph.Models.Message> GetAsync(Action<RequestConfiguration<Microsoft.Graph.Me.Messages.Item.MessageItemRequestBuilder.MessageItemRequestBuilderGetQueryParameters>> requestConfiguration = default, CancellationToken cancellationToken = default) { var requestInfo = ToGetRequestInformation(requestConfiguration); var errorMapping = new Dictionary<string, ParsableFactory<IParsable>> { { "XXX", Microsoft.Graph.Models.ODataErrors.ODataError.CreateFromDiscriminatorValue }, }; return await RequestAdapter.SendAsync<Microsoft.Graph.Models.Message>(requestInfo, Microsoft.Graph.Models.Message.CreateFromDiscriminatorValue, errorMapping, cancellationToken).ConfigureAwait(false); } }

I cannot use the batch pattern in this scenario. We are making local copies of messages for our clients. So we listen to web hooks and then go get the message. In the original code example, I should have denoted that it was a different call with a different refresh token for each client in each loop.

michael-schleger commented 1 month ago

I'm seeing something similar having gone from 5.44 to 5.56. GetAsync on users. I'll try get more info in the next few days but the crash dumps look recursively like this.

Microsoft.Kiota.Abstractions.Serialization.ParseNodeProxyFactory+<GetRootParseNodeAsync>d__8.MoveNext() System.Runtime.CompilerServices.AsyncTaskMethodBuilder1[[System.Canon, mscorlib]].Start[[Microsoft.Kiota.Abstractions.Serialization.ParseNodeProxyFactory+d8, Microsoft.Kiota.Abstractions]](d__8 ByRef) `

grippstick commented 1 month ago

I can confirm it is not just messages. It seems to happen with any large result while processing other concurrent requests. I had to start making raw http requests for attachments, because it would throw stackoverflows often. I also get it with large events. Since we normally see it under load as part of a que processor, we just let it crash reprocess what it failed on and it seems to work. It makes me think this is a threading issue. I have raised another issue asking for a way to turn off kiota backing store, for performance purposes.

almostjulian commented 1 month ago

+1 on this, we're also seeing https://github.com/microsoft/kiota-dotnet/issues/309 since moving to 5.56. Going back to 5.44 seems to solve it. Issue seems to happen when making concurrent graph calls.

andrueastman commented 1 month ago

@grippstick @almostjulian Any chance you can confirm this issue still occurs with adding the latest dependency of Microsoft.Graph.Core? https://www.nuget.org/packages/Microsoft.Graph.Core/3.1.14

grippstick commented 1 month ago

I can confirm that it is still happening with version 3.1.14.0

On Mon, Aug 5, 2024 at 4:40 AM Andrew Omondi @.***> wrote:

@grippstick https://github.com/grippstick @almostjulian https://github.com/almostjulian Any chance you can confirm this issue still occurs with adding the latest dependency of Microsoft.Graph.Core? https://www.nuget.org/packages/Microsoft.Graph.Core/3.1.14

— Reply to this email directly, view it on GitHub https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588#issuecomment-2268634728, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPTOWKW3SHCDN4IUJRGC3ZP5CBDAVCNFSM6AAAAABK7PL6LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYGYZTINZSHA . You are receiving this because you were mentioned.Message ID: @.***>

-- Joshua

jav-mut commented 3 weeks ago

+1 on this, I am experiencing the same issue, and it is solved by reverting the NuGet packages Microsoft.Graph, Microsoft.Graph.Core, and Microsoft.Kiota.* to their previous versions:

NuGet Package | Version with issue -> Previous version without issue

Microsoft.Graph | 5.56.0 -> 5.44.0 Microsoft.Graph.Core | 3.1.15 -> 3.1.8 Microsoft.Kiota.Abstractions | 1.11.0 -> 1.7.12 Microsoft.Kiota.Authentication.Azure | 1.11.0 -> 1.1.4 Microsoft.Kiota.Http.HttpClientLibrary | 1.11.0 -> 1.3.7 Microsoft.Kiota.Serialization.Form | 1.11.0 -> 1.1.5 Microsoft.Kiota.Serialization.Json | 1.11.0 -> 1.1.8 Microsoft.Kiota.Serialization.Multipart | 1.11.0 -> 1.1.3 Microsoft.Kiota.Serialization.Text | 1.11.0 -> 1.1.4

baywet commented 3 weeks ago

Thanks everyone for the information. This is most likely the same root cause as #2563 and it'll be interesting to test once this patch is released to see whether you're still facing the issue. https://github.com/microsoft/kiota-dotnet/pull/326

andrueastman commented 3 weeks ago

Thanks everyone for the information. This is most likely the same root cause as #2563 and it'll be interesting to test once this patch is released to see whether you're still facing the issue. microsoft/kiota-dotnet#326

Please confirm if this still occurs if you add a reference to https://www.nuget.org/packages/Microsoft.Kiota.Abstractions/1.11.1

grippstick commented 3 weeks ago

I can confirm that it still occurs with 1.11.1.

baywet commented 3 weeks ago

@grippstick Thank you for the additional information. We just released 1.11.2, can you check again please?

grippstick commented 3 weeks ago

I can confirm that the hot path is gone and it is not looping through all of the bytes.

You have sped it up from 7seconds to 1.8 seconds!

It is still slower than using a raw request and parsing with JSON.net, which is ultimately my preference from a resource perspective. Is there any reason we need a backing storage or cannot supply something that doesn't track state at all?

It is deploying to our servers and I will let you know if it takes care of the stack overflow from: https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588

On Wed, Aug 14, 2024 at 12:03 PM Vincent Biret @.***> wrote:

@grippstick https://github.com/grippstick Thank you for the additional information. We just released 1.11.2, can you check again please?

— Reply to this email directly, view it on GitHub https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588#issuecomment-2289328354, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPTOTBHIZBMPNGSJEZF3LZROEWJAVCNFSM6AAAAABK7PL6LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGMZDQMZVGQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Joshua

baywet commented 3 weeks ago

Thank you for the additional information. Does that timing also include the network time? If not this is still pretty slow. Don't hesitate to provide additional hot path analysis like you've provided on the other thread, they were especially helpful!

grippstick commented 3 weeks ago

This is still an issue as of Microsoft.Kiota.Abstractions 1.12.2

I am still trying to replicate it in a consistent manner.

baywet commented 3 weeks ago

Thank you for the additional information. Besides a repro, are you able to get a stacktrace at least so we can look into the potential root cause?

grippstick commented 3 weeks ago

The ~1.8 seconds does include network time.

I am trying to setup some tests so I can put a comprehensive response together.

At this point, I am not sure the issue is in Kiota, I think it is coming from the graph sdk json parser, but I can't prove it.

Mainly because I am working with symbols and I can't actually stop on all the breakpoints I want to stop on.

On Thu, Aug 15, 2024 at 8:53 AM Vincent Biret @.***> wrote:

Thank you for the additional information. Besides a repro, are you able to get a stacktrace at least so we can look into the potential root cause?

— Reply to this email directly, view it on GitHub https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588#issuecomment-2291308197, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPTORAMSEXALOD6IPVDZ3ZRSXD5AVCNFSM6AAAAABK7PL6LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJRGMYDQMJZG4 . You are receiving this because you were mentioned.Message ID: @.***>

-- Joshua

almostjulian commented 3 weeks ago

Is it possible to get an interim graph sdk release that takes in the recent kiota changes?

baywet commented 3 weeks ago

Is it possible to get an interim graph sdk release that takes in the recent kiota changes?

Sure, we'll start by releasing core, and then consume the dependency here so we can release a patch https://github.com/microsoftgraph/msgraph-sdk-dotnet-core/pull/874

grippstick commented 3 weeks ago

To make sure the network was out of the way, I captured the json that came down for a FileAttachment and stored it to disk. The attachment itself is roughly 17Mb in the contentBytes.

I then created a test function to do some benchmarking. When I us kiota to parse the payload in a loop of 1000 and doing garbage collections with each iteration[not shown below], I get an average of ~200ms. When I just do a raw parse with Json.Net and access the property directly, I get around 60ms.

[TestMethod]
public void TestParseNode()
{
    var json = System.IO.File.ReadAllText(@"S:\Development\Attach.txt");

    var jDoc = System.Text.Json.JsonDocument.Parse(json);

    var pNode = new Microsoft.Kiota.Serialization.Json.JsonParseNode(
                            jDoc.RootElement,
                            Microsoft.Kiota.Serialization.Json.KiotaJsonSerializationContext.Default
                        );

    var sw = new System.Diagnostics.Stopwatch();

    sw.Restart();
    var attach = pNode.GetObjectValue(Microsoft.Graph.Models.FileAttachment.CreateFromDiscriminatorValue);
    var b1 = attach.ContentBytes;
    sw.Stop();
    var time1 = sw.ElapsedMilliseconds;
    //time1 averages 230ms

    sw.Restart();
    jDoc = System.Text.Json.JsonDocument.Parse(json);
    var b2 = jDoc.RootElement.GetProperty("contentBytes").GetBytesFromBase64();
    sw.Stop();
    var time2 = sw.ElapsedMilliseconds;
    //time2 which ibcludes parsing the document again from scratch is roughly 60ms.

    Assert.IsTrue(time1 < time2);
}

image

baywet commented 3 weeks ago

@grippstick Thank you for the additional information. I don't suspect this is the stack overflow path, but it's still good to improve the performance. I'd have suspected that STJ is using Convert.Base64 under the hood, but it doesn't seem to be the case. In any case it's better to use their APIs since we'll be closer to "manual deserialization" in terms of behaviour and performance. I've put together a draft PR https://github.com/microsoft/kiota-dotnet/pull/337 Would you mind pulling it locally and running the same test you provided with local links please?

baywet commented 3 weeks ago

Actually, I took the time to setup a benchmark

currently implementation

Method Mean Error StdDev Median
Run 737.7 us 55.56 us 163.8 us 796.2 us

My PR

Method Mean Error StdDev Median
Run 86.52 us 9.229 us 26.03 us 76.67 us

I'll go ahead and get this PR merged and released. In the meantime, if you can keep tracking down the stacktrace / repro for the stackoverflow issue?

grippstick commented 3 weeks ago

That is a real big performance improvement from the multiple seconds! What is that 2 orders of magnitude?! :)

I just pulled and I am going to try and get it to SO. It happens intermittently and seemingly randomly. I am going to point it at a local url, so i dont get throttled by the API, and just let it run in a multithreaded manner to see if I can force its hand.

On Fri, Aug 16, 2024 at 8:41 AM Vincent Biret @.***> wrote:

Actually, I took the time to setup a benchmark currently implementation Method Mean Error StdDev Median Run 737.7 us 55.56 us 163.8 us 796.2 us My PR Method Mean Error StdDev Median Run 86.52 us 9.229 us 26.03 us 76.67 us

I'll go ahead and get this PR merged and released. In the meantime, if you can keep tracking down the stacktrace / repro for the stackoverflow issue?

— Reply to this email directly, view it on GitHub https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588#issuecomment-2293533582, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPTOXWGYQZMK7HHDZKQOTZRX6O5AVCNFSM6AAAAABK7PL6LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJTGUZTGNJYGI . You are receiving this because you were mentioned.Message ID: @.***>

-- Joshua

grippstick commented 3 weeks ago

1. I have it built locally, but I had to update KiotaGenerated.csproj, because other projects were complaining about the compiler. Just to get it building, I reverted back to: <PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="4.10.0" ExcludeAssets="runtime" />

2. Do you have an easy suggestion on how to get my projects that use nuget packages to use my local code? I tried to swap the reference to be project based, and I tried updating the assemblies in the packages folder, but it cased my MSGraph stuff to go crazy. It should be noted that this project is running in 4.8, because of some legacy concerns.

On Fri, Aug 16, 2024 at 8:45 AM Joshua Grippo @.***> wrote:

That is a real big performance improvement from the multiple seconds! What is that 2 orders of magnitude?! :)

I just pulled and I am going to try and get it to SO. It happens intermittently and seemingly randomly. I am going to point it at a local url, so i dont get throttled by the API, and just let it run in a multithreaded manner to see if I can force its hand.

On Fri, Aug 16, 2024 at 8:41 AM Vincent Biret @.***> wrote:

Actually, I took the time to setup a benchmark currently implementation Method Mean Error StdDev Median Run 737.7 us 55.56 us 163.8 us 796.2 us My PR Method Mean Error StdDev Median Run 86.52 us 9.229 us 26.03 us 76.67 us

I'll go ahead and get this PR merged and released. In the meantime, if you can keep tracking down the stacktrace / repro for the stackoverflow issue?

— Reply to this email directly, view it on GitHub https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588#issuecomment-2293533582, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPTOXWGYQZMK7HHDZKQOTZRX6O5AVCNFSM6AAAAABK7PL6LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJTGUZTGNJYGI . You are receiving this because you were mentioned.Message ID: @.***>

-- Joshua

-- Joshua

baywet commented 2 weeks ago
  1. yes it's a common issue, make sure you have the latest version of net8 sdk installed
  2. yes, that's painful, you effectively need to clone all the projects, and replace the package references with relative project references all the way. I have not found a way around that like npm link would do for example.
grippstick commented 2 weeks ago

Roger that. I will get it figured out and get back to you.

On Mon, Aug 19, 2024 at 8:56 AM Vincent Biret @.***> wrote:

  1. yes it's a common issue, make sure you have the latest version of net8 sdk installed
  2. yes, that's painful, you effectively need to clone all the projects, and replace the package references with relative project references all the way. I have not found a way around that like npm link would do for example.

— Reply to this email directly, view it on GitHub https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588#issuecomment-2296644141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPTOV4QFU6IL2FVCJEFJDZSH2QZAVCNFSM6AAAAABK7PL6LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJWGY2DIMJUGE . You are receiving this because you were mentioned.Message ID: @.***>

-- Joshua

grippstick commented 2 weeks ago

I am working through a script I created and while I am trying to analyze a stack track, can you clarify something. In the screen shot below, where it says "The 3 frame(s) above this were repeated 2324 times." Are they referring to the above in the picture or the below, which is above it in the stack?

Is it referring to the ParseNodeProxyFactory or JsonDocument?

image

baywet commented 2 weeks ago

I'd say the factory. I believe it's the only one that can be recursive. But it also depends on how you are tracing things. (single request or multiple, sampling or not, etc...)

grippstick commented 2 weeks ago

I think it is the factory as well, but I wanted to make sure there as not some peculiarity of the Tooling that I was not understanding.

On Tue, Aug 20, 2024 at 3:57 PM Vincent Biret @.***> wrote:

I'd say the factory. I believe it's the only one that can be recursive. But it also depends on how you are tracing things. (single request or multiple, sampling or not, etc...)

— Reply to this email directly, view it on GitHub https://github.com/microsoftgraph/msgraph-sdk-dotnet/issues/2588#issuecomment-2299751892, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPTOVBZY4F4DLPW3WBIYDZSOUURAVCNFSM6AAAAABK7PL6LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJZG42TCOBZGI . You are receiving this because you were mentioned.Message ID: @.***>

-- Joshua

grippstick commented 2 weeks ago

I am pretty confident that I can confirm this to be inside of Microsoft.Kiota and not MsGraph. With that said I am still not sure if it is an actual StackOverflow error, even though that is what is being reported. I think something might be failing somewhere, which is causing state to get corrupted and it gets stuck in an infinite loop. I have tests that will reproduce the error within 2400 iterations. Not sure if this is the right channel to send updates, but I wanted to give you a heads up that I am homing in on the actual problem.

baywet commented 2 weeks ago

Thank you for the continuous work on this issue! We're looking forward to your findings and getting a reproduction. For a single request, we should see only a couple (less than 5) calls to GetRootParseNodeAysnc for the different layers. So you're most likely on a promising path already.

grippstick commented 2 weeks ago

I think this will be resolved with this pull request: https://github.com/microsoft/kiota-dotnet/pull/343

If you want to talk through anything, I am more than happy to.

grippstick commented 1 week ago

I can confirm that this has stopped the StackOverFlow exception. It has also improved performance and memory usage in my long running process.

baywet commented 1 week ago

Thank you for all the information you've provided and the pull requests!

almostjulian commented 1 week ago

Any chance we can get a new graph sdk version with this baked in?

baywet commented 1 week ago

This is what #2641 is supposed to do but the CI failed. I just restarted it.