Closed RocketRide9 closed 2 months ago
It seems that you are running unoptimized builds.
can you re-run with —release and post the results?
@phimuemue
sh-5.2$ cargo run --release
Finished release [optimized] target(s) in 0.01s
Running `target/release/sketches`
Thing took 134 ms arr3 = 21
Thing took 144 ms arr3 = 21
Thing took 168 ms arr3 = 21
sh-5.2$ cargo run --release
Finished release [optimized] target(s) in 0.01s
Running `target/release/sketches`
Thing took 130 ms arr3 = 21
Thing took 142 ms arr3 = 21
Thing took 143 ms arr3 = 21
sh-5.2$ cargo run --release
Finished release [optimized] target(s) in 0.01s
Running `target/release/sketches`
Thing took 134 ms arr3 = 21
Thing took 143 ms arr3 = 21
Thing took 147 ms arr3 = 21
TBF, I don't trust millisecond-level differences from a "run once with stopwatch" perf.
Please demonstrate with something like criterion that there's a statistically-significant difference here, or show that it optimizes differently.
Optimized version looks good, no? Difference in time is comparable to measurement error. The issue is that debug version works around 0,7 seconds slower. Speed in debug version matters too, isnt it?
izip!(a, b, c)
expands to a.into_iter().zip(b).zip(c).map(|((x, y), z)| (x, y, z))
.
so the difference for me here is .map(|((x, y), z)| (x, y, z)).for_each(|(x, y, z)| ...)
vs .for_each(|((x, y), z)| ...)
. If there is a difference, then first it's probably subtle (maybe less subtle in debug mode), and more importantly we can't do much about it because we merely rely on libcore.
Speed in debug version matters too, isnt it?
To be frank, no it doesn't. The default debug config doesn't even attempt to produce reasonable machine code.
If you want your debug config to have non-terrible performance, I recommend setting opt-level=1
for it. That's not much slower to compile, and it's way faster at runtime. The difference between don't even try and just do the easy stuff is massive.
If you want your debug config to have non-terrible performance, I recommend setting
opt-level=1
for it.
that's way better:
sh-5.2$ cargo run
Finished dev [optimized + debuginfo] target(s) in 0.01s
Running `target/debug/sketches`
Thing took 157 ms arr3 = 21
Thing took 173 ms arr3 = 21
Thing took 173 ms arr3 = 21
sh-5.2$ cargo run
Finished dev [optimized + debuginfo] target(s) in 0.01s
Running `target/debug/sketches`
Thing took 153 ms arr3 = 21
Thing took 177 ms arr3 = 21
Thing took 177 ms arr3 = 21
sh-5.2$ cargo run
Finished dev [optimized + debuginfo] target(s) in 0.01s
Running `target/debug/sketches`
Thing took 154 ms arr3 = 21
Thing took 175 ms arr3 = 21
Thing took 175 ms arr3 = 21
it's interesting that first multiplication is always slightly faster than others even after changing order. I copied first loop and pasted after the third one (which uses izip) and it has the same speed as 2nd and 3rd.
Anyway, if it's expected that default debug config in rust is so slow, i think this issue can be closed?
code
tests: