Closed sadikovi closed 6 years ago
Thanks @sadikovi ! The performance with this PR looks awesome. Do you know why it improved so much?
No worries. I think it is because of the change in decoding.rs
code.
We used to create a copy of buffer ptr that covers a range of 12 bytes for each value, and create a slice and then vector.
Now we create a copy of buffer ptr once with all bytes that we are going to read, and then extract slice of the data. Since we know that all bytes should match to the values we need to read, we could safely iterate with two indices - one over values and another one over byte offsets. For each iteration we just reconstruct 3 u32
from slices.
OK got it. It a little surprising that the change in decoding
can give such a big boost on the performance. This is nice.
Merged. Thanks @sadikovi !
This PR updates
INT96
type to use[u32; 3]
instead ofVec<u32>
. I also updated plain decoding for INT96 to take advantage of an array and slightly refactored the method itself.Now INT96 looks like this:
I also updated
set_data
method to set elements directly.All tests pass locally. I also ran benchmark assuming that my code is correct (not included in this PR, but git diff is attached):
+macro_rules! gen_random_int96 {
gen_random_ints!(gen_10, 10); gen_random_ints!(gen_100, 100); gen_random_ints!(gen_1000, 1000);
+gen_random_int96!(gen_int96_1000, 1000); + pub fn gen_test_strs(total: usize) -> (usize, Vec) {
let mut words = Vec::new();
words.push("aaaaaaaaaa");
diff --git a/benches/decoding.rs b/benches/decoding.rs
index ec1f097..300ccdd 100644
--- a/benches/decoding.rs
+++ b/benches/decoding.rs
@@ -113,6 +113,8 @@ fn bench_decoding(
})
}
+plain!(plain_i96_1k_32, 1024, 128, Int96Type, Type::INT96, gen_int96_1000); + plain!(plain_i32_1k_32, 1024, 32, Int32Type, Type::INT32, gen_1000); plain!(plain_i32_1k_64, 1024, 64, Int32Type, Type::INT32, gen_1000); plain!(plain_i32_1k_128, 1024, 128, Int32Type, Type::INT32, gen_1000);