krisprice / ipnet

IpNet, Ipv4Net, and Ipv6Net types and methods for Rust
Apache License 2.0
122 stars 26 forks source link

Inline constructors and field getters #48

Closed jmeggitt closed 1 year ago

jmeggitt commented 1 year ago

Recently, I have been running into some performance issues while reading BGP table dumps. After doing a number of profiles using VTune on Windows and valgrind on Linux, I found one of performance issues is caused by a missed optimization involving Ipv6Net::new. There are a number of other performance issues with my program that I still need to look at, however this one is by far the easiest to fix.

The crux of the issue is that Rust uses thin LTO by default so inlining is not performed across codegen units unless explicitly requested via #[inline] (or when constructed for a generic type, but that isn't the case here). While it is possible to avoid this by enabling fat LTO within a crate's Cargo.toml, this setting is ignored when compiling dependencies making this solution ineffective for library developers.

As you can see in this screenshot of a profile I ran in VTune, Ipv6Net::new took up a massive 10% (2.972 seconds) of the total program runtime. This large amount of CPU time is only possible because my workload of going through BGP data consists almost entirely of reading IPv6 prefixes. However, all of the work being done by this function is unnecessary when viewed in the context of the caller. The majority of the time spent by this function is constructing the return value from the function arguments. When inlined, the compiler is able to construct the Ipv6Net in place so these moves are not required. Thanks to branch prediction the impact of the prefix_len check is minimal, but when inlined the compiler is able to reliably remove the entire check. image

In this pull request I propose adding #[inline(always)] to the constructors (Ipv6Net::new and Ipv4Net::new) and getters (Ipv6Net::addr, Ipv6Net::prefix_len, Ipv4Net::addr, and Ipv4Net::prefix_len). Additionally I added #[inline(always)] to Ipv6::max_prefix_len and Ipv4::max_prefix_len as they were the only other const functions in the crate and I saw no downside in doing so.

krisprice commented 1 year ago

Thanks @jmeggitt - happy to add this. Is #[inline(always)] necessary or does a simple #[inline] achieve the same?

jmeggitt commented 1 year ago

@krisprice I did a quick test and it looks like #[inline] is sufficient to get the compiler to inline the function. Since #[inline(always)] might be a bit too heavy handed, I changed it to #[inline] in this pull request.