petgraph / fixedbitset

A simple bitset container for Rust
https://docs.rs/fixedbitset/
Apache License 2.0
121 stars 45 forks source link

Efficient creation of all-ones bitset #101

Open hniksic opened 6 months ago

hniksic commented 6 months ago

I need to create a bitset of known size with all bits set. Currently I do it with:

fn ones_with_capacity(bits: usize) -> FixedBitSet {
    let mut everything = FixedBitSet::with_capacity(bits);
    everything.set_range(.., true);
    everything
}

This is not hard to write, but unlike Vec::with_capacity(), FixedBitSet::with_capacity() initializes the bitset to zeros, only for my code to immediately set it to ones. With largish bitsets I'd like to avoid the unnecessary initial resetting. Looking at assembly output in release mode, it seems that the compiler doesn't inline with_capacity(), from which I infer that it cannot eliminate the initial zeroing.

The question is, what is the most efficient way to create an all-ones bitset? Is it the above code, or rather something like:

fn ones_with_capacity(bits: usize) -> FixedBitSet {
    FixedBitSet::with_capacity_and_blocks(bits, std::iter::repeat(!0))
}

And should the crate include a constructor like FixedBitSet::ones_with_capacity() (possibly with a more optimized implementation)?

james7132 commented 6 months ago

As of #86, your implementation of ones_with_capacity should be close to optimal.

hniksic commented 6 months ago

As of #86, your implementation of ones_with_capacity should be close to optimal.

@james7132 Sorry if the answer is obvious, but are you referring to the first or the second implementation?

Looking at the implmentation of with_capacity_and_blocks(), the second one seems incorrect. with_capacity_and_blocks() starts off by collecting blocks, and ones_with_capacity() is passing it an infinite iterator. I think it should ideally collect blocks.take(bits.div_ceil(8)), but that's material for a separate PR.

james7132 commented 5 months ago

Your second one is close to optimal now. There is no collect after #86. It preallocates a zeroed buffer of the right size then writes over it using the provided blocks up to the provided capacity. This does a second pass over the buffer, so we can probably make it more efficient by using a Vec<MaybeUninit<SimdBlock>> first. That said, the calloc from the first allocation should be pretty fast compared to an iterator collect, so I'm not sure exactly how much would be gained from that.

hniksic commented 5 months ago

Thanks for the clarification! Would you accept a PR that adds a ones_with_capacity() constructor (possibly under another name)? Creating an all-1 bitset feels like a fairly fundamental operation, and I can imagine I'm not the first one to need it.

james7132 commented 5 months ago

Yes go ahead! Should be pretty close to with_capacity, but some care needs to keep the last few bits beyond the fixed length are zeroed out.

hniksic commented 5 months ago

@james7132 My idea was to literally use the second implementation. Do you think that would be suboptimal?

Ideally with_capacity_and_blocks() would avoid zeroing out the whole bitset before writing to it. (And it should stlil zero out the part not covered by blocks.) Perhaps that could be achieved by adding an unsafe with_capacity_uninit(), and replacing assignment to subblock with ptr::write(). Something like:

pub fn with_capacity_and_blocks<I: IntoIterator<Item = Block>>(bits: usize, blocks: I) -> Self {
    unsafe {
        // start off with uninitialized bitset for efficiency
        let mut bitset = Self::with_capacity_uninit(bits);
        let Range {
            start: mut subblock,
            end,
        } = bitset.as_mut_slice().as_mut_ptr_range();
        // copy data from `blocks`
        for value in blocks {
            if subblock == end {
                break;
            }
            subblock.write(value);
            subblock = subblock.add(1);
        }
        // zero out the rest
        while subblock != end {
            subblock.write(0);
            subblock = subblock.add(1);
        }
        bitset
    }
}

Then a ones constructor could be implemented as Self::with_capacity_and_blocks(bits, std::iter::repeat(!0)) with (as far as I can tell) no loss of performance.

james7132 commented 5 months ago

The main gotcha with that implementation is that a Vec<MaybeUninit> needs to be used and we cannot convert it to a type that is initialized until the entire vec is populated for that implementation to be sound. This likely means many of the private utility functions you're using there are likely to be unusable.

hniksic commented 5 months ago

Oh, right, thanks for explaining. I think we can avoid MaybeUninit as long as we only use pointers to write to uninitialized data, and never create references to it. The standard idiom is to allocate with Vec::with_capacity(), fill the allocated data, and then use set_len() to bless it. In our case we don't even need set_len(), it's enough to create the final FixedBitSet from the pointer:

pub fn with_capacity_and_blocks<I: IntoIterator<Item = Block>>(bits: usize, blocks: I) -> Self {
    if bits == 0 {
        return Self::new();
    }

    let simd_block_cnt = bits.div_ceil(SimdBlock::BITS);
    let block_cnt = bits.div_ceil(Block::BITS as usize);

    // SAFETY: We use Vec::with_capacity() to obtain uninitialized memory, and
    // initialize all of it before passing ownership to the returned FixedBitSet
    unsafe {
        let mut vec = Vec::<SimdBlock>::with_capacity(simd_block_cnt);
        let mut subblock = vec.as_mut_ptr().cast::<Block>();
        let subblock_end = subblock.add(block_cnt);
        assert!(subblock_end != subblock); // we handle bits == 0 at the beginning
        // copy as much as we can from blocks
        for value in blocks {
            subblock.write(value);
            subblock = subblock.add(1);
            if subblock == subblock_end {
                break;
            }
        }
        // zero out the remainder of the allocation
        let simd_block_end = vec.as_mut_ptr().add(simd_block_cnt).cast::<Block>();
        core::ptr::write_bytes(subblock, 0, simd_block_end.offset_from(subblock) as usize);
        let data = NonNull::new_unchecked(vec.as_mut_ptr());
        // FixedBitSet is taking over the ownership of vec's data
        core::mem::forget(vec);
        FixedBitSet {
            data,
            capacity: simd_block_cnt,
            length: bits,
        }
    }
}