Closed jmeggitt closed 1 year ago
Thanks @jmeggitt - happy to add this. Is #[inline(always)]
necessary or does a simple #[inline]
achieve the same?
@krisprice I did a quick test and it looks like #[inline]
is sufficient to get the compiler to inline the function. Since #[inline(always)]
might be a bit too heavy handed, I changed it to #[inline]
in this pull request.
Recently, I have been running into some performance issues while reading BGP table dumps. After doing a number of profiles using VTune on Windows and valgrind on Linux, I found one of performance issues is caused by a missed optimization involving
Ipv6Net::new
. There are a number of other performance issues with my program that I still need to look at, however this one is by far the easiest to fix.The crux of the issue is that Rust uses thin LTO by default so inlining is not performed across codegen units unless explicitly requested via
#[inline]
(or when constructed for a generic type, but that isn't the case here). While it is possible to avoid this by enabling fat LTO within a crate'sCargo.toml
, this setting is ignored when compiling dependencies making this solution ineffective for library developers.As you can see in this screenshot of a profile I ran in VTune,
Ipv6Net::new
took up a massive 10% (2.972 seconds) of the total program runtime. This large amount of CPU time is only possible because my workload of going through BGP data consists almost entirely of reading IPv6 prefixes. However, all of the work being done by this function is unnecessary when viewed in the context of the caller. The majority of the time spent by this function is constructing the return value from the function arguments. When inlined, the compiler is able to construct theIpv6Net
in place so these moves are not required. Thanks to branch prediction the impact of theprefix_len
check is minimal, but when inlined the compiler is able to reliably remove the entire check.In this pull request I propose adding
#[inline(always)]
to the constructors (Ipv6Net::new
andIpv4Net::new
) and getters (Ipv6Net::addr
,Ipv6Net::prefix_len
,Ipv4Net::addr
, andIpv4Net::prefix_len
). Additionally I added#[inline(always)]
toIpv6::max_prefix_len
andIpv4::max_prefix_len
as they were the only otherconst
functions in the crate and I saw no downside in doing so.