Use [u32; 3] for INT96 and fix TODOs

sadikovi commented 6 years ago

This PR updates INT96 type to use [u32; 3] instead of Vec<u32>. I also updated plain decoding for INT96 to take advantage of an array and slightly refactored the method itself.

Now INT96 looks like this:

#[derive(Clone, Debug)]
pub struct Int96 {
  value: Option<[u32; 3]>
}

I also updated set_data method to set elements directly.

All tests pass locally. I also ran benchmark assuming that my code is correct (not included in this PR, but git diff is attached):

Master branch:

test plain_i96_1k_32           ... bench:      46,226 ns/iter (+/- 3,120) = 265 MB/s
test plain_i96_1m_32           ... bench:  48,827,212 ns/iter (+/- 997,549) = 257 MB/s

This fix:

test plain_i96_1k_32           ... bench:       2,618 ns/iter (+/- 272) = 4693 MB/s
test plain_i96_1m_32           ... bench:   2,633,102 ns/iter (+/- 215,907) = 4778 MB/s

Benchmark diff:


diff --git a/benches/common.rs b/benches/common.rs
index c8d8e0c..1e68434 100644
--- a/benches/common.rs
+++ b/benches/common.rs
@@ -39,10 +39,31 @@ macro_rules! gen_random_ints {
}
}

+macro_rules! gen_random_int96 {

($fname:ident, $limit:expr) => {
pub fn $fname(total: usize) -> (usize, Vec) {
let mut values = Vec::with_capacity(total);
let mut rng = thread_rng();
for _ in 0..total {
let mut result = Int96::new();
// - for master branch:
// result.set_data(vec![rng.gen::(), rng.gen::(), rng.gen::()]);
// - for the patch:
result.set_data(rng.gen::(), rng.gen::(), rng.gen::());
values.push(result);
}
let bytes = values.len() * ::std::mem::size_of::();
(bytes, values)
}
} +}
gen_random_ints!(gen_10, 10); gen_random_ints!(gen_100, 100); gen_random_ints!(gen_1000, 1000);

+gen_random_int96!(gen_int96_1000, 1000); + pub fn gen_test_strs(total: usize) -> (usize, Vec) { let mut words = Vec::new(); words.push("aaaaaaaaaa"); diff --git a/benches/decoding.rs b/benches/decoding.rs index ec1f097..300ccdd 100644 --- a/benches/decoding.rs +++ b/benches/decoding.rs @@ -113,6 +113,8 @@ fn bench_decoding( }) }

+plain!(plain_i96_1k_32, 1024, 128, Int96Type, Type::INT96, gen_int96_1000); + plain!(plain_i32_1k_32, 1024, 32, Int32Type, Type::INT32, gen_1000); plain!(plain_i32_1k_64, 1024, 64, Int32Type, Type::INT32, gen_1000); plain!(plain_i32_1k_128, 1024, 128, Int32Type, Type::INT32, gen_1000);

coveralls commented 6 years ago

Coverage increased (+0.09%) to 94.887% when pulling b9fd453cf62758906d79f6c9acc6fa27a08ad38b on sadikovi:fix-int96-todo into 15f122deea469b31923736bbba9378a4c4ed3f34 on sunchao:master.

sunchao commented 6 years ago

Thanks @sadikovi ! The performance with this PR looks awesome. Do you know why it improved so much?

sadikovi commented 6 years ago

No worries. I think it is because of the change in decoding.rs code.

We used to create a copy of buffer ptr that covers a range of 12 bytes for each value, and create a slice and then vector.

Now we create a copy of buffer ptr once with all bytes that we are going to read, and then extract slice of the data. Since we know that all bytes should match to the values we need to read, we could safely iterate with two indices - one over values and another one over byte offsets. For each iteration we just reconstruct 3 u32 from slices.

sunchao commented 6 years ago

OK got it. It a little surprising that the change in decoding can give such a big boost on the performance. This is nice.

sunchao commented 6 years ago

Merged. Thanks @sadikovi !

sunchao / parquet-rs

Use [u32; 3] for INT96 and fix TODOs #92