Closed utterstep closed 3 years ago
Extracting keys to vec before iteration solves the problem for part two (and reduces effect for the first part), so I'd assume that iteration over HashMap is significantly slower in 1.36 than it was in 1.35
snippet 1:
self.boundaries
.into_iter()
.filter(|(x, y)| {
self.points
.keys()
.map(|point| point.distance_to(*x, *y))
.sum::<i32>()
< max_distance
})
.count()
1.35.0: 11.7 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.36.0: 22.6 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.93x time ratio
snippet 2:
// Point is struct { x: i32, y: i32 } with derived Hash
// self.points is HashMap<Point, i32>
let points = self.points.keys().collect::<Vec<_>>();
self.boundaries
.into_iter()
.filter(|(x, y)| {
points
.iter()
.map(|point| point.distance_to(*x, *y))
.sum::<i32>()
< max_distance
})
.count()
1.35.0: 11 ms ± 36 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.36.0: 9.74 ms ± 40.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
0.88x time ratio
snippet 2 is actually faster in 1.36, so it may be the case, that other HashMap operations is faster in 1.36 (as expected), but iteration over values is significantly slower
Same goes for part one (iter_mut extraction solves the regression):
snippet 3:
for (x, y) in &self.boundaries {
let mut min_distance = i32::max_value();
let mut closest = None;
for (point, area) in self.points.iter_mut() {
...
}
...
1.35.0: 15.8 ms ± 50.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.36.0 23 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.45x time ratio
snippet 4:
// Point is struct { x: i32, y: i32 } with derived Hash
// self.points is HashMap<Point, i32>
let mut points = self.points.iter_mut().collect::<Vec<_>>();
for (x, y) in &self.boundaries {
let mut min_distance = i32::max_value();
let mut closest = None;
for (point, area) in points.iter_mut() {
...
}
...
1.35.0: 12.9 ms ± 50.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.36.0: 14.8 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.14x time ratio
Cc @Amanieu for hashbrown
Here is reduced reproduction of the case:
use std::collections::HashSet;
const N_ITEMS: usize = 50;
const N_REPEATS: usize = 10000001;
fn main() {
let mut set = HashSet::new();
for i in 0..N_ITEMS {
set.insert(i ^ 13);
}
let mut res = 0;
for _ in 0..N_REPEATS {
for item in set.iter() {
res ^= item;
}
}
println!("sum: {}", res);
}
1.35.0:
utterstep@utterstep-nix:~/my/playground/hashset-perf-test$ /usr/bin/time -v ./hashset-perf-test-1.35 > /dev/null
Command being timed: "./hashset-perf-test-1.35"
User time (seconds): 0.41
System time (seconds): 0.00
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.41
...
1.36.0:
utterstep@utterstep-nix:~/my/playground/hashset-perf-test$ /usr/bin/time -v ./hashset-perf-test-1.36 > /dev/null
Command being timed: "./hashset-perf-test-1.36"
User time (seconds): 0.76
System time (seconds): 0.00
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.76
...
HashMap iteration is slower for small sizes, but faster for larger sizes (e.g. 10k elements).
Closing in favor of https://github.com/rust-lang/hashbrown/issues/278.
Seeing ~2x (46ms vs 28ms) performance regression after updating 1.35.0 to 1.36.0 (probably due to new HashMap implementation?).
Steps to reproduce
rustup override set 1.35.0 && cargo build -p day-6 --release
APP_PART=one APP_INPUT_FILE=day-6/full.txt target/release/day-6
,APP_PART=two APP_INPUT_FILE=day-6/full.txt APP_MAX_DISTANCE=10000 target/release/day-6
. I used ipython for this, for example:import os
%timeit os.system('APP_PART=one APP_INPUT_FILE=day-6/full.txt target/release/day-6 > /dev/null')
%timeit os.system('APP_PART=two APP_INPUT_FILE=day-6/full.txt APP_MAX_DISTANCE=10000 target/release/day-6 > /dev/null')
rustup override set 1.36.0 && cargo build -p day-6 --release
My system specs
performance
for time estimation purposesResults 1.35.0:
15.8 ms ± 50.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.7 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.36.0:
23 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
22.6 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Main logic is in this file: https://github.com/utter-step/advent-2018/blob/master/day-6/src/world/mod.rs, it's a solution to Advent of Code 2018 day 6 problem.
Part one is using
HashMap::iter_mut
a lot: https://github.com/utter-step/advent-2018/blob/master/day-6/src/world/mod.rs#L46 Part two is usingHashMap::keys
a lot: https://github.com/utter-step/advent-2018/blob/master/day-6/src/world/mod.rs#L83Meta 1.35.0:
1.36.0:
Hope that info I provided would be useful!
Vlad