microsoft / elfie-arriba

open source search projects
MIT License
80 stars 31 forks source link

ItemTree PartialArray allocations hit LOH #258

Open ToddGrun opened 1 week ago

ToddGrun commented 1 week ago

Noticed this while looking at a trace of editing a C# file in Visual Studio. Roslyn uses elfie in it symbol search support, and that usage is showing up in allocation profiles, in particular, LOH allocations.

Instead, if the ItemTree class used something like Roslyn's SegmentedList class, then it could avoid the LOH allocations (and likely allocate significantly less due to resizing).

I started to go down the path of creating a PR for this change, but it ended up being complicated by the SegmentedList dependencies (specifically the Nullable and System.Memory packages on netfx).

If elfie could take dependencies on those packages, adding the SegmentedList support would be fairly trivial.

Image

ToddGrun commented 1 week ago

@ScottLouvau -- I know you worked on this before, and I believe I had mentioned this to you a while back. Not sure if any active development is being done on this package, but a bit of modernization would be nice as Roslyn is using it.

ToddGrun commented 1 week ago

There's another place in the profile where elfie's dictionary usage is showing up in the LOH, which could probably be helped by using Roslyn's SegmentedDictionary

Image

ScottLouvau commented 1 week ago

I don’t think Ellie was in active use in other products when I left. I think MikeFan and RTaket would be the best people to confirm with.

If adding the dependencies doesn’t work, I think reimplementing the Elfie columns as a sequence of 64KB blocks would be safe. I don’t think any consumers use the arrays directly or can request subranges from them. Not as nice as using an existing solution, but probably not too bad.

-Scott

Get Outlook for iOShttps://aka.ms/o0ukef


From: Todd Grunke @.> Sent: Monday, November 11, 2024 3:34:18 PM To: microsoft/elfie-arriba @.> Cc: Scott Louvau @.>; Mention @.> Subject: Re: [microsoft/elfie-arriba] ItemTree PartialArray allocations hit LOH (Issue #258)

There's another place in the profile where elfie's dictionary usage is showing up in the LOH, which could probably be helped by using Roslyn's SegmentedDictionary

image.png (view on web)https://github.com/user-attachments/assets/0f14c071-0afd-4646-84bb-0e6f8ba4eea0

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/elfie-arriba/issues/258#issuecomment-2469287911, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADYDIUMEAJFNSFZHGZLN6WL2AE5HVAVCNFSM6AAAAABRSXTTV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRZGI4DOOJRGE. You are receiving this because you were mentioned.Message ID: @.***>

ToddGrun commented 1 week ago

If spans were avaialble, the substring usage in ReadText could probably be generally avoided too

Image

ToddGrun commented 1 week ago

@rtaket @mikefan Any thoughts on adding Nullable and System.Memory dependencies?

ScottLouvau commented 1 week ago

Todd, is the database that Elfie is loading a static one?

Elfie has a "mutable" mode when you're building a database (indexing things), and then it can persist out to a binary format and re-load in an "immutable" mode. In the reload case, the binary format tells Elfie how large each array should be, so it allocates them to the ideal size immediately.

If this use of Elfie is for barely-changing data, you can build it this way once, then call AddReferenceDatabase.WriteBinary(BinaryWriter) and then AddReferenceDatabase.ReadBinary(BinaryReader) for much faster reloading.

It would still allocate large objects for the big arrays, but at least it wouldn't need to resize them at all.

If you then modify the per-column storage to use an array-of-arrays under the LOH limit, you could probably get Elfie to avoid the LOH altogether without any new library dependencies needed.

ToddGrun commented 1 day ago

Sorry for the slow response

Todd, is the database that Elfie is loading a static one?

I don't think so, although I really don't know the code that uses elfie well from Roslyn. It looks like we get the db here and do some patching operations on it.

If there is a way for Roslyn to be more efficient in it's usage of elfie, I'd definitely be willing to make those changes. Without knowledge of elfie or how we're using it, the best I could do without investing a lot of effort was suggesting a targeted fix in elfie.