Closed meooow25 closed 1 year ago
Note: I've made some changes to the algorithm. If you didn't see the previous version you can ignore this comment. I made the change because I realized that we could be forced to do $O(W)$ work even when $n$ is very small, making the complexity $O(n/W + W)$. I changed goL
and goR
to avoid this, so it's correctly $O(n/W)$ now. This can be observed on fromRange (-1,0)
which changed from ~70ns to 10ns. Also fixed a bug where I used bitmapOfSuffix
instead of bitmapOf
, the tests ran fine because on x86 the shift value is masked to 5/6 bits anyway.
Cool!
For #632
Benchmark on GHC 9.2.5:
Comparing
fromDistinctAscList [1..2^12]
from #951 againstfromRange (1,2^12)
, currently it takes24.3 μs
(~62x the time) and with fusion it would still take4.49 μs
(~11x the time).