oscoin / osrank-rs

A pre-alpha osrank implementation in Rust.
http://oscoin.io/
3 stars 3 forks source link

Profiling #34

Open MeBrei opened 5 years ago

MeBrei commented 5 years ago

Investigate profiling/flamegraph for osrank_naive. Document slow parts of the code and if/how they can be improved.

adinapoli-mndc commented 5 years ago

Some update on this. I have installed flamegraph which is a handy wrapper for dtrace in Rust, which integrates with cargo. After doing that, I have disabled optimisations in release builds as suggested by this post, by adding:

# Set debug = true if you want to do profiling with flamegraph.
[profile.release]
debug = true 

To the cargo.toml. This produced the following graph:

Screenshot 2019-08-21 at 09 09 49

It's not very large and hard to navigate if this is not an interactive image, but my current interpretation goes this way. There are 3 main "tips" in the graph: the one on the left in irrelevant to us as it's basically "benchmarking noise" from criterion, and the other two basically dominates the whole CPU time (as you would expect), almost equally.

My interpretation is that rank_network is actually the slower one but it's called less times, whereas we do plenty of calls to random_walk throughout the benchmarking, and this is also witnessed by the osrank_naive-0e910a9a4ee537f4criterion::Bencher::iter::h9c169f50c1c9202f which "retains" all those calls to random_walk.

If we analise random_walk by zooming into it, we can see the cost is dominated by clone calls (unsurprisingly), choose_weighted and neighbours calls:

Screenshot 2019-08-21 at 09 19 08

As far as rank_network goes, instead, I might be wrong but it seems like the cost is dominated by the fraction manipulation and calls to fmt::write and float_to_decimal_common_shortest. My hunch here is that we are paying the price of f64 arithmetic while also converting from the Weight type:

Screenshot 2019-08-21 at 09 23 17

More investigation is needed, but I think that if we start by eliminating all the unnecessary calls to .clone and we sort out our Osrank representation, we should already land into a better place.