Closed sujayakar closed 4 years ago
Cool stuff! I saw you just made a change. Is this ready to merge?
Yep, I think so! I have some local changes to vectorize select
and add a builder to speed up construction, but I can submit subsequent PRs for those :)
@tov, ping on this? Let me know if there's anything I can help with.
Nope, sorry, I just haven’t had I time to look at it yet! Should get to it quite soon.
On Tue, Nov 19, 2019 at 12:32 PM Sujay Jayakar notifications@github.com wrote:
@tov https://github.com/tov, ping on this? Let me know if there's anything I can help with.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tov/succinct-rs/pull/10?email_source=notifications&email_token=AAFS3BBV5ZUAJHGZ6QHJRTTQUQWKDA5CNFSM4JHFDJDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEPHTPQ#issuecomment-555645374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFS3BENBVWU4CXL5OXBVBLQUQWKDANCNFSM4JHFDJDA .
-- Dr. Jesse A. Tov Assistant Professor of Instruction Computer Science McCormick School of Engineering Northwestern University
hey @tov, I'm going to hack on this a bit over the holidays anyways, so I'll put it up in a separate repo and we can chat about merging later if you'd like.
Okay, please keep me posted. Over the holidays is when I have time as well…
Cool, I've put up the library at https://github.com/sujayakar/rsdict and fixed most of the issues I wanted to (docstrings, from_blocks
constructor, runtime CPU feature detection), so take a look when you have time. Shall I resubmit a PR?
Yes, please do! Sorry again for the delay.
hmm, looks like packed_simd
only supports nightly Rust. @tov, would you rather we put SIMD support behind a feature gate or only support nightly for this crate?
Edit: I just added the feature gate :)
+1 to the feature gate.
@tov, this should be ready for another review!
@sujayakar There seems to be a change in behaviour with RsDict
. The rank1(pos)
operation does not consider the bit at pos
anymore. Minimal example:
let keys = vec![0,2,4,5];
let mut bv: BitVector<u64> = BitVector::with_fill(m as u64 + 1, false);
for k in keys {
bv.set_bit(k as u64, true);
}
let mut rs = RsDict::new();
for bit in bv.iter() {
rs.push(bit);
}
let jj = Rank9::new(bv);
println!("bv {} rs {}", jj.rank1(0), rs.rank1(0)); // jj returns 1, while rs returns 0
Did you close this because I've been slow to respond, or because you decided it would be better in a separate crate? I’m back now, but I’d also be happy to share maintenance duties here.
hey @tov, yeah, I just wanted to wrap up that side project of mine.
I'd be happy to merge the library back into succinct-rs
if you're okay with making me a maintainer. That way I can maintain rsdict
with shorter cycle times, and I'd also love to improve the other structures in this library. I'm actually a bit busy at the moment but would pick that up when I have more time.
I've ported @hillbig's implementation [1] of Navarro and Providel's data structure [2] for simultaneously accelerating rank and select queries over a bit vector. This structure is great for select heavy workloads, like wavelet matrices [3] that currently binary search over ranks. The shape of the implementation is largely the same but with some small changes and Rust idioms.
The testing is all through quickcheck by testing all inputs for
rank
andselect
against a naive implementation.I've also added some basic benchmarks, and the results are already pretty good. This benchmark generates a random 1M bitvector and queries the rank at 1000 different indices.
So, this data structure is faster than Jacobson for rank but slower than rank9, and it's much faster than binary search for my given benchmark.
Then, I added SIMD acceleration for the rank computation, and it's closer to rank9 now. It's behind an experimental feature flag since that whole ecosystem isn't fully settled yet. Here's the benchmarks rerun with the feature and
RUSTFLAGS="-C target-cpu=native"
on my 2018 MBP.Not all of the speed up is just from changing
target-cpu
: Here's the same benchmark without the SIMD acceleration feature.[1] https://github.com/hillbig/rsdic [2] https://users.dcc.uchile.cl/~gnavarro/ps/sea12.1.pdf [3] https://github.com/sekineh/wavelet-matrix-rs