pisa-engine / pisa

PISA: Performant Indexes and Search for Academia
https://pisa-engine.github.io/pisa/book
Apache License 2.0
926 stars 64 forks source link

`topk_queue` rapidcheck tests fail on Clang 15 #504

Closed elshize closed 1 year ago

elshize commented 1 year ago

We have some property tests implemented that check topk_queue thresholds, and they mysteriously fail on Clang 15.

elshize commented 1 year ago

It seems that recent changes to the pop_heap implementation in libc++ assume that the entire range [first, last) is a valid heap, including the last element that we are pushing. I'm trying to figure out if it's a bug, or if the standard in fact requires that.

On another note, the implementation of pop_heap in both libc++ and libstdc++ use "heapsort with bounce" approach to popping. That is, first a full sift down is performed to find the leaf, and then the heap is fixed by sifting up if necessary (https://en.wikipedia.org/wiki/Heapsort#Bottom-up_heapsort). The idea is that this takes fewer comparisons (see wiki article for details) because the value we sift down-and-up is in the leaf. But in our case this value can be anything that is larger than the top of the heap, so not sure how much we're gaining by doing this hack.

I want to actually do some benchmarks for this.

LLVM commit introducing the change: https://github.com/llvm/llvm-project/commit/79d08e398c17e83b118b837ab0b52107fd294c3e

LLVM discourse thread I started to find out more: https://discourse.llvm.org/t/binary-heap-pop-push-operation-using-pop-heap/67430

elshize commented 1 year ago

Ok, so it turns out that the standard does state that the entire range must be a valid heap: http://eel.is/c++draft/pop.heap#2

The fix itself is trivial (need to do a pop before appending the new value, and then swap the old with new, and call push), but I will (a) look into whether there is a more efficient alternative, and (b) do some benchmarking.