Open 3Hren opened 5 months ago
If you want to "deduplicate bitsets", why not substract? S1 - S2 will give you S1 without S2 elements.
If you want to map indices to values, you need https://github.com/tower120/hi_sparse_array . It is somewhat like hi_sparse_bitset, but last level points to the actual data. It is still in research phase.
And what is the meaning of sets order? When S1>S2?
By deduplicating/mapping I meant that suppose we have the following mapping:
{
0b10101 -> 0,
0b10101 -> 1,
0b10001 -> 2,
...
0b10001 -> 100500,
}
I would like to transform it in a such way:
{
0b10101 -> [0, 1],
0b10001 -> [2, 100500],
...
}
Using BTreeMap
this is trivial.
By order I mean that in fact each bitset can be represented as a very large unsigned integer. So why not allow to comparing them?
Looks like you need hi_sparse_array's multi_merge. There is a version of it now in repository, but I rewriting it to new more space efficient data structure.
Ord
is probably possible. But looks like what you want to do will be inefficient (since you'll need to process each bitset one-by-one). Again, looks like you need merge. And I don't clearly see how to implement Ord
... Especially for non TRUSTED_HIERARCHY.
It's not trivial algorithm (maybe I'm wrong).
Well, if you "just" need Ord
, and don't care about speed - you probably can compare block_iter() between two sets. (Don't forget to take block starts into count as well).
Isn't it enough to just to compare levels one-by-one? First try level0 indices, if they are equal then level1, and finally, if again equal - block-by-block? I'm not familiar with TRUSTED_HIERARCHY internals, will take a look.
Well, if you "just" need
Ord
, and don't care about speed - you probably can compare block_iter() between two sets. (Don't forget to take block starts into count as well).
Of course I care, that's why I am here ;)
Anyway, thanks for the answer!
For TRUSTED_HIERARCHY
maybe that is enough (I can't see that far without practical implementation).
But if you work, for example, with lazy(!) result of intersection - raised bits in level bitmasks may lead to EMPTY data blocks. That case is called non-trusted-hierarchy. You can't trust hierarchy and just compare level bitmasks. From Eq implementation experience thou, there is a "special case" when you compare TRUSTED_HIERARCHY to !TRUSTED_HIERARCHY - then some optimization possible. Otherwise - (as far as I remember) it ends up with plain data block comparison one-by-one.
It is just with multi_merge/fold_merge you basically OR bitmasks of all maps/arrays at once, and then just apply resolve function to intersected data items in the very last level. At least on paper that should be super-linearly faster then what you trying to do.
Ah, I see it's more complicated than I thought, interesting nonetheless. Thanks for the clarification!
Hi again!
There is the final step to take full advantage of your great library.
Sometimes it is necessary to deduplicate bitsets or to build a mapping between bitsets and some values. There are two standard approaches: using
HashMap
orBTreeMap
.The first one requires to calculate bitset's hash, which, in case of large bitsets, can be painful. However, for
BTreeMap
it seems that a multilevel approach to comparing two bitsets will result in a very fast comparison, also lazy bitsets seems to perfectly fit in this case. Since thePartialEq
is already implemented it seems that it will no harm to also implementPartialOrd
/Ord
. Or is there something that prevents this from being implemented?