rdaum / rart-rs

An Adaptive Radix Tree implementation.
Apache License 2.0
36 stars 1 forks source link

Iter and range methods don't work #7

Open troad456 opened 1 year ago

troad456 commented 1 year ago

Thanks for this excellent librarry!

I noticed that the iter and range methods don't work currently because only the leaf nodes are being iterated over, and not the inner nodes. Just curious if you plan to resolve this

rdaum commented 1 year ago

Thanks for poking at this and finding that. It doesn't surprise me they are broken, unfortunately; they've been on my list of things to poke at for a while because I'm not happy with the way they're implemented. Right now they'll do a bunch of excessive copies during iteration, and defeats half the purpose of the ART structure, which is fast sequential operations.

I would definitely like to fix this, when I get the time, not likely to be this week.

In the meantime, if you have a reproduction case, even a small fragment of code, feel free to provide. Regression tests more than welcome.

rdaum commented 1 year ago

So I've fixed .iter(), but .range() remains broken for now, pending more serious surgery.

The TLDR is that there's issues with the comparison of key values as passed in to .insert, .range, etc. and with the Partial representations that end up stored in the tree. When iterating we have to reconstruct the key bytes based on the various partials in the tree. With range the reconstructed keys need to be compared to the begin/end keys passed in for the Range, so that we can determine if we've hit the end (or start) of the range. Right now this is broken.

At some point hopefully soon I am going to rework how keys are produced, and change it so that users have to pass in a couple functors for key->bytes and partials->key conversion when making the tree. I think this will allow me to fix the range stuff properly.

There was also an issue with iteration and sort order. At one point I changed one of the child node mappings so that it wasn't internally sorted -- but instead by insertion order -- which had a fairly minor but real performance improvement. But it means that the tests that compared iteration order in the ART vs iteration order in a BTree were unstable. I've switched back to the SortedKeyedMapping for now, but will likely make it an option for the user as soon as I can get around to that refactoring.

grovesNL commented 1 month ago

@rdaum Do you have a sense for how complicated the key rework might be? I have a use case that might be a good fit for rart but I can't benchmark it without range, so I was wondering if I might be able to help somehow.

rdaum commented 1 month ago

I think the Range thing is solvable, but this codebase in general is a bit aged and also in a state that's reflective of my Rust knowledge/experience from a year and a half ago rather than today...

I did fork it recently for a separate personal project but also without range support: https://github.com/rdaum/Relbox/tree/main/src/index/art that's a slightly more modern implementation.

That implementation is a bit different though, with reference counted copy-on-write internal nodes, so holds old versions and gives a kind of immutable API.

If there's interest in reviving and cleaning up this code I could be convinced to carve out some time.

On Wed, 26 Jun 2024 at 07:49, Josh Groves @.***> wrote:

@rdaum https://github.com/rdaum Do you have a sense for how complicated the key rework might be? I have a use case that might be a good fit for rart but I can't benchmark it without range, so I was wondering if I might be able to help somehow.

— Reply to this email directly, view it on GitHub https://github.com/rdaum/rart-rs/issues/7#issuecomment-2191495098, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMEHVG64K7PLVFMPHAI2TZJKTEDAVCNFSM6AAAAABJ5XA4E2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJRGQ4TKMBZHA . You are receiving this because you were mentioned.Message ID: @.***>

grovesNL commented 1 month ago

That sounds interesting, thank you! I'll take a look.

Maybe I could explain my use case in case you have any other ideas - maybe I could skip range somehow. I currently have a BTreeMap<u32, u32> but it's causing performance problems on range queries/insertion/removal (especially range queries, so some trade-off would probably be ok).

I'm interested in some kind of radix tree to represent the same thing but with fast mutable range searches, e.g., "give me all keys bounded by a range, maybe remove some nodes, then insert some more nodes" My thought was to write my own tree and organize the data around SIMD lanes somehow to accelerate the search, but it looks like rart already does most of that so maybe I could help get it to the point I'd like.

For what it's worth, I also know roughly how the keys will be distributed (over 0 to u32::MAX) and how far queries would span over those shards, so it's possible to bucket/shard them across multiple trees if that could help in some way (e.g., rart per shard and linear scans, although maybe that's a bad idea).