Closed JuanPotato closed 2 years ago
Running the benchmarks (which aren't a full holistic picture of this library), gives me some worrying numbers. :grimacing: These are ran on my cheap laptop:
origin/master..JuanPotato/master
deku_read_byte time: [35.971 ns 36.308 ns 36.661 ns]
change: [+197.36% +202.61% +208.23%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
deku_write_byte time: [84.430 ns 85.312 ns 86.242 ns]
change: [+23.911% +26.359% +28.970%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
deku_read_enum time: [58.737 ns 59.303 ns 59.910 ns]
change: [+172.07% +178.52% +184.68%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
deku_write_enum time: [143.86 ns 145.50 ns 147.24 ns]
change: [+28.792% +31.542% +34.688%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
deku_read_vec time: [2.4821 us 2.5039 us 2.5270 us]
change: [+130.90% +134.30% +137.80%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
deku_write_vec time: [6.2461 us 6.3132 us 6.3845 us]
change: [+28.856% +31.085% +33.302%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
origin/master for reference:
deku_read_byte time: [11.810 ns 11.942 ns 12.085 ns]
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severe
deku_write_byte time: [67.383 ns 68.076 ns 68.816 ns]
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
deku_read_enum time: [20.725 ns 20.979 ns 21.251 ns]
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
deku_write_enum time: [110.09 ns 111.23 ns 112.46 ns]
Found 13 outliers among 100 measurements (13.00%)
8 (8.00%) high mild
5 (5.00%) high severe
deku_read_vec time: [1.0543 us 1.0685 us 1.0835 us]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
deku_write_vec time: [4.7405 us 4.7984 us 4.8617 us]
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe
@myrrlyn @sharksforarms
Running benchmarks on my adsb library after switching deku to this branch:
lax_messsages time: [1.5287 s 1.5305 s 1.5330 s]
change: [+56.845% +57.120% +57.449%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
5 (5.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severe
Thank you for the feedback! Yeah that's not great performance, and I'm also thinking my code might have some bugs in it in very specific cases, Will add some new tests and see.
Also not sure if bitvec itself regressed or if our usage of it is not ideal.
Thank you for the feedback! Yeah that's not great performance, and I'm also thinking my code might have some bugs in it in very specific cases, Will add some new tests and see.
Also not sure if bitvec itself regressed or if our usage of it is not ideal.
There is this issue, it might not be related to our performance regression however. https://github.com/bitvecto-rs/bitvec/issues/158
Thanks for this work! Let's see if we can figure out the performance issue before merging.
I ran the benchmark executables with perf. bitvec::Domain for sure is part of the problem.
master:
JuanPotato:master:
Just by refactoring https://github.com/JuanPotato/deku/blob/09427b8eb5e39402c4dec49070e9ae0a5ecbe20d/src/impls/primitive.rs#L35 with a match and thus reusing bit_slice.domain().region().unwrap().1
the performance improved again up to 50%.
But even this is of course still much slower than the master branch.
At some point I began working on a replacement to bitvec that would be much more performant for deku's usage. I got it looking pretty good but it required a lot of refactoring and is definitely a breaking change. Can't remember how the performance compares to master though
@myrrlyn I see you updated bitvec w.r.t. performance with the https://github.com/bitvecto-rs/bitvec/commit/4de627722c6e5b31107e2765fc2ebd7d8c69a84c commit.
When updating the Cargo.toml
of this update branch
diff --git a/Cargo.toml b/Cargo.toml
index 14d1a0a..ebfb0c0 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -26,7 +26,7 @@ const_generics = []
[dependencies]
deku_derive = { version = "^0.12.0", path = "deku-derive", default-features = false}
-bitvec = { version = "1.0.0", default-features = false }
+bitvec = { git = "https://github.com/bitvecto-rs/bitvec", default-features = false }
I get performance boosts vs the v1 version of bitvec:
deku_read_byte time: [15.993 ns 16.075 ns 16.167 ns]
change: [-21.903% -21.507% -20.952%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
deku_write_byte time: [53.931 ns 54.056 ns 54.216 ns]
change: [-2.3317% -1.1832% +0.5395%] (p = 0.10 > 0.05)
No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
deku_read_enum time: [27.867 ns 28.048 ns 28.236 ns]
change: [-22.115% -21.262% -20.134%] (p = 0.00 < 0.05)
Performance has improved.
Found 26 outliers among 100 measurements (26.00%)
15 (15.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
deku_write_enum time: [91.762 ns 91.857 ns 91.960 ns]
change: [-2.7815% -0.2585% +3.3560%] (p = 0.88 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
deku_read_vec time: [1.0901 us 1.0914 us 1.0927 us]
change: [-36.372% -35.693% -35.277%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
deku_write_vec time: [4.4868 us 4.4926 us 4.4985 us]
change: [+4.6790% +5.9749% +7.6922%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
However, still not the kind of performance that current deku provides:
deku_read_byte time: [6.4593 ns 6.4657 ns 6.4736 ns]
change: [-60.561% -60.293% -60.044%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severe
deku_write_byte time: [40.263 ns 40.318 ns 40.370 ns]
change: [-27.729% -26.536% -25.703%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
deku_read_enum time: [11.972 ns 11.984 ns 11.996 ns]
change: [-58.424% -57.816% -57.379%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
deku_write_enum time: [66.074 ns 66.155 ns 66.233 ns]
change: [-31.005% -27.837% -24.929%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
deku_read_vec time: [638.18 ns 638.77 ns 639.49 ns]
change: [-41.653% -41.444% -41.180%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
1 (1.00%) high mild
4 (4.00%) high severe
deku_write_vec time: [2.8597 us 2.8650 us 2.8723 us]
change: [-37.915% -36.792% -35.948%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
By any chance, does the faster old version have #[inline]
on non-public functions
By any chance, does the faster old version have
#[inline]
on non-public functions
I'm not at my computer, but I think this branch is the code that's used for our code. Looks like everything possible is inlined?
as of bitvec v1.0.1, compared to old bitvec:
deku_read_byte time: [15.651 ns 15.732 ns 15.806 ns]
change: [+93.359% +94.661% +95.905%] (p = 0.00 < 0.05)
Performance has regressed.
deku_write_byte time: [54.336 ns 54.673 ns 55.005 ns]
change: [+24.738% +25.686% +26.695%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
deku_read_enum time: [27.266 ns 27.461 ns 27.667 ns]
change: [+80.433% +83.093% +85.122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
deku_write_enum time: [89.895 ns 90.558 ns 91.282 ns]
change: [+26.135% +27.569% +28.872%] (p = 0.00 < 0.05)
Performance has regressed.
deku_read_vec time: [1.3311 us 1.3361 us 1.3412 us]
change: [+95.736% +100.29% +105.39%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
deku_write_vec time: [3.9468 us 3.9727 us 4.0026 us]
change: [+29.954% +31.057% +32.167%] (p = 0.00 < 0.05)
Performance has regressed.
Better https://github.com/wcampbell0x2a/deku/tree/update-bitvec-1.x.x:
deku_read_byte time: [11.777 ns 11.799 ns 11.822 ns]
deku_write_byte time: [50.278 ns 50.330 ns 50.383 ns]
deku_read_enum time: [20.374 ns 20.398 ns 20.422 ns]
deku_write_enum time: [84.699 ns 84.795 ns 84.896 ns]
deku_read_vec time: [991.05 ns 992.97 ns 995.64 ns]
deku_write_vec time: [3.7768 µs 3.7814 µs 3.7862 µs]
Superseded by https://github.com/sharksforarms/deku/pull/281
What changes I made
BitSlice<Msb0, u8>
->BitSlice<u8, Msb0>
get_raw_slice
is gone, usedomain
offset_from
is unsafe and needs to be called onBitPtr
and is reversedNot related to the upgrade but it made my tests pass. DekuData.vis was never read and that emitted a warning. Cargo test didn't like this extra warning so I renamed it to _vis for now.
closes #243