zopefoundation / BTrees

Other
80 stars 28 forks source link

Get a random element from a BTree #185

Open azmeuk opened 2 years ago

azmeuk commented 2 years ago

I would love to have a performant way to get a random element out of a BTree.

At the moment I use random.choice (random.choice(mybtree.keys())), but internally it calls len on the ITreeItems, and on large BTrees (>200k items) the operation is long enough (3~4s on my computer) to not fit a web request expectations.

Due to the nature of BTrees, I would expect that this would not be hard to build a pseudo-random method that would go through random child BTrees until it meets a leaf bucket, and then choose a random item in it. This would not exactly be randomness if the tree is not perfectly balanced but this would be good enough in my opinion.

What do you think? Did I miss a simpler way to get a random element? Would such a PR be OK?

d-maurer commented 2 years ago

Éloi Rivard wrote at 2022-9-19 05:30 -0700:

... I would love to have a performant way to get a random element out of a BTree. ... Due to the nature of BTrees, I would expect that this would not be hard to build a pseudo-random method that would go through random child buckets until it meets a leaf. This would not exactly be randomness if the tree is not perfectly balanced but this would be good enough in my opinion. ... What do you think? Did I miss a simpler way to get a random element? Would such a PR be OK?

Buckets are the leaves of a BTree. Thus, your terminology is not optimal.

The BTrees package contains the module check. It contains functions which let you decompose a large BTree into its components (sub-BTrees and Buckets). You could use them to implement your idea and check whether it is good enough for you.

azmeuk commented 2 years ago

Buckets are the leaves of a BTree. Thus, your terminology is not optimal.

Indeed. Maybe go through random child BTrees until it meets a leaf bucket, and then choose a random item in it is a better reformulation.

I will have a look at the check module.