Open ucyo opened 4 years ago
The benchmarks using criterion
are better, but still slow:
search symbol/using bitvec
--------------------------
base 1.00 873.1±22.66µs 38.3 MB/sec
new 1.00 873.1±22.66µs 38.3 MB/sec
Criterion code:
use criterion::{criterion_group, criterion_main, Criterion};
use criterion::{Throughput};
use succinct::stream::{BitBuffer};
use succinct::BitVector;
use succinct::bit_vec::BitVecMut;
use succinct::rank::{JacobsonRank, RankSupport};
use rand::distributions::WeightedIndex;
use rand::prelude::*;
/// Generates vector of values to be looked up in BitVector
fn get_lookup_values(size: usize) -> Vec<u8> {
let words: Vec<u8> = vec![20, 17, 6, 3, 2, 2, 2, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
];
generate_random_byte_vector(0, words.len() as u8, size, &words)
}
fn generate_random_byte_vector(min: u8, max: u8, count: usize, weights: &[u8]) -> Vec<u8> {
let mut rng = rand::thread_rng();
let choices: Vec<_> = (min..max).collect();
let dist = WeightedIndex::new(weights).unwrap();
assert_eq!(choices.len(), weights.len());
(0..count).map(|_| choices[dist.sample(&mut rng)]).collect()
}
fn prepare() -> (Vec<(u8,u8)>, Vec<u8>, JacobsonRank<BitVector<usize>>) {
let keys: Vec<usize> = vec![0,32,48,52,54,56,58,60,62,63];
let vals: Vec<(u8,u8)> = vec![(0,1),(1,2),(2,4),(3,5),(4,5),(5,5),(6,5),(7,5),(8,6),(9,6)];
let mut bv: BitVector<usize> = BitVector::with_fill(keys[keys.len()-1] as u64 +1, false);
for k in keys {
bv.set_bit(k as u64, true);
}
let jbv = JacobsonRank::new(bv);
let size = 35_044;
let look_up_values = get_lookup_values(size);
(vals, look_up_values, jbv)
}
fn iter_search_key_or_next_small_key_bitvec(table: &[(u8,u8)], jbv: &JacobsonRank<BitVector<usize>>, data: &[u8]) {
for key in data {
table[jbv.rank(*key as u64, true) as usize];
}
}
fn benchmark_searching_for_key_value(c: &mut Criterion) {
let (table,origin,jbv) = prepare();
let mut group = c.benchmark_group("search symbol");
group.throughput(Throughput::Bytes(origin.len() as u64));
group.bench_function("using bitvec", |b| {
b.iter(|| iter_search_key_or_next_small_key_bitvec(&table, &jbv, &origin))
});
group.finish();
}
criterion_group!(search, benchmark_searching_for_key_value);
criterion_main!(search);
Using Rank9
seems to speed up things considerably:
search symbol/using bitvec
--------------------------
rank9 1.00 204.2±7.31µs 163.7 MB/sec
jacob 1.00 873.1±22.66µs 38.3 MB/sec
The new RsDict
from the PR is a bit faster than rank9
:
search symbol/using bitvec
--------------------------
rsdict 1.00 201.5±8.32µs 165.8 MB/sec
rank9 1.01 204.2±7.31µs 163.7 MB/sec
jacob 4.33 873.1±22.66µs 38.3 MB/sec
Hey, sorry for the slow response. And thanks for the benchmarks! I'm now available to look at and merge #10, if that will help. I’m also wondering if you have an idea how much faster this “should” be…
I actually do not have a base for the acceptable speed rate. One option get a base could be to use the count_ones()
method on a vector of u32
or u64
primitive types.
@tov Do you know of any other rank algorithms we can implement?
Hi there,
I seem to be not able to get constant time rank lookup for my
BitVector
. I get a throughput of1.3 MiB/sec
. Here is a minimal example of my usage:Is there something wrong with my initialisation of the
BitVector
or is there a way to speed things up?