Open adamnovak opened 8 years ago
Thanks for the suggestion Adam!
In bitvectors batch inserts would take just O(log n + k) time, where k is the number of bits inserted. Tree balancing is not a problem because we would need to rebalance the tree (log n time) only every log n inserted bits (i.e. leaf size), so this cost would be amortized over the k insertions. In wavelet-tree strings this would be multiplied by log sigma. In rle_str this could probably also be done efficiently, but it's more tricky because we need to do multiple batch inserts on all the gap-encoded bitvectors corresponding to characters in the inserted vector
While writing the library I was thinking about doing this, but since my main goal was to implement dynamic RLE FM-indexes I abandoned the idea (because in this case inserts are done at random positions and not in batches). I'll add it to the TODO list.
@nicolaprezza are you still working on RLE FM-indexes? That sounds really cool!
Thanks Erik! yes, the RLE FM-index in this library is fully operative and I have already used it to implement a couple of algorithms to compute LZ77 in compressed space. If you are interested in RLE indexes, take a look also to my github.com/nicolaprezza/lz-rlbwt repository: this one is a RLE FM-index combined with LZ77 to achieve compression also of the suffix-array sampling.
It would be useful (I think) to have a way to make a
rle_str
from avector<int>
, or to insert a wholevector<int>
into anrle_str
at some position.Would breaking those operations out as batch operations make them any more efficient than just inserting each value one at a time? I don't know much about the internals, but I suspect a workload of inserts all at the very end, or all one after the other, might produce tree balancing problems.