seqan / product_backlog

This repository is used as product backlog for all SeqAn relevant backlog items. This is intended to organise the work for the team.
2 stars 1 forks source link

Enumerating binning_bitvector bins #412

Open eseiler opened 2 years ago

eseiler commented 2 years ago

Taken from https://github.com/seqan/seqan3/pull/2930

I also have an idea for a follow-up (mainly intended for the core-team):

I think we had at some point some the discussion how we should make enumerating the bins accessible to the user. At that time it wasn't really clear to us how one could approach this.

I tried to implement a bit_index_view which returns the bit positions at which a bit is 1. It turned out that this produced DOUBLE the amount of asm compared to the current nested for loop (60 instructions instead of 30). Furthermore, the implementation needs at least 80 additional LOC just to make it an iterator. (I already knew that you need to pay more when flattening a nested for loop, but effectively doubling the runtime?!)

Therefore, I would suggest offering the following API function:

// interleaved_bloom_filter<data_layout_mode>::membership_agent::binning_bitvector

template <typename on_contained_bin_fn_t>
void binning_bitvector::bins(on_contained_bin_fn_t && on_contained_bin_fn)
{
    // ...
    for ( ... )
    {
        // ...
        for ( ... )
        {
             bin += jump_to_next_1bit(bit_sequence);
             on_contained_bin_fn(bin);
        }
    }
}

This is a bit unconventional in the sense that we pass in a callback, and normally we would prefer a view, but as I stated above it won't be as performant. Having a callback will behave similar to a view, in the sense that we iterate over all possible values on the fly, but with the drawback of no control over the control flow (stopping/starting at a certain element, etc...)

Another approach would be to return a std::vector<size_t>, but this is undesirable in this case as this will be in the hot-loop and any additional heap allocation should be avoided.

eseiler commented 2 years ago

@marehr I made a new issue from the comment because it makes it easier to discuss.

What would your specific use case be?