Hello, I've noticed that the previous Stack will maintain the log of every push and pop after the snapshot is taken, and I think some of them are not necessary.
I implement a new version of Stack, and only pop operations on those elements that come from previous state will be logged. I didn't change the Stack's interface, but I have to make the Stack public for the benchmark.
I use Criterion for benchmark, and it shows that there is a performance improvement in all tested cases.
I've tested the stack with small, medium and large objects, and their sizes are 8, 24, 1024 respectively on my machine (Win10, AMD x64).
And I've tested four kinds of operations (Each for 10000 times):
Take a snapshot, push an element, restore, push an element.
Take a snapshot, push an element, clear the snapshot.
Fist, fill the stack with 10000 elements. Take a snapshot, pop an element, restore, pop an element.
Fist, fill the stack with 10000 elements. Take a snapshot, pop an element, clear the snapshot.
Maybe still some work to be done to ensure the improvement. Maybe 10000 is too small. And maybe we can test it on more combination of operations and on more machines?
Here is the output when I ran the benchmark after switching to my implementation:
Running benches\stack.rs (target\release\deps\stack-919a208ca9f56013.exe)
push - restore - small time: [109.45 µs 111.95 µs 114.35 µs]
change: [-26.247% -23.846% -21.302%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
push - clear - small time: [59.352 µs 60.821 µs 62.254 µs]
change: [-34.636% -32.454% -30.331%] (p = 0.00 < 0.05)
Performance has improved.
pop - restore - small time: [150.34 µs 153.28 µs 156.16 µs]
change: [-28.022% -25.326% -22.736%] (p = 0.00 < 0.05)
Performance has improved.
pop - clear - small time: [98.087 µs 101.11 µs 104.10 µs]
change: [-33.414% -31.195% -28.872%] (p = 0.00 < 0.05)
Performance has improved.
push - restore - medium time: [121.30 µs 124.27 µs 127.19 µs]
change: [-43.546% -41.826% -39.832%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
push - clear - medium time: [88.422 µs 90.238 µs 92.018 µs]
change: [-41.644% -39.933% -38.070%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
pop - restore - medium time: [222.34 µs 226.94 µs 231.60 µs]
change: [-58.554% -57.618% -56.740%] (p = 0.00 < 0.05)
Performance has improved.
pop - clear - medium time: [128.86 µs 132.06 µs 135.33 µs]
change: [-68.412% -67.281% -66.120%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
push - restore - large time: [9.4303 ms 9.5598 ms 9.6931 ms]
change: [-52.432% -51.540% -50.639%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
push - clear - large time: [9.0552 ms 9.1835 ms 9.3115 ms]
change: [-53.716% -52.852% -51.968%] (p = 0.00 < 0.05)
Performance has improved.
pop - restore - large time: [12.606 ms 12.785 ms 12.965 ms]
change: [-62.220% -61.550% -60.917%] (p = 0.00 < 0.05)
Performance has improved.
pop - clear - large time: [11.184 ms 11.335 ms 11.488 ms]
change: [-65.429% -64.762% -64.127%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
2 (2.00%) high mild
Hello, I've noticed that the previous Stack will maintain the log of every push and pop after the snapshot is taken, and I think some of them are not necessary.
I implement a new version of Stack, and only pop operations on those elements that come from previous state will be logged. I didn't change the Stack's interface, but I have to make the Stack public for the benchmark.
I use Criterion for benchmark, and it shows that there is a performance improvement in all tested cases.
I've tested the stack with small, medium and large objects, and their sizes are 8, 24, 1024 respectively on my machine (Win10, AMD x64).
And I've tested four kinds of operations (Each for 10000 times):
Maybe still some work to be done to ensure the improvement. Maybe 10000 is too small. And maybe we can test it on more combination of operations and on more machines?
Here is the output when I ran the benchmark after switching to my implementation: