Performance regression in nightly

Hi all,

I noticed that my application's microbenchmarks experienced a 5X (5 ms to 28 ms) performance degradation when I switched from rustc 1.13.0-nightly (5531c314a 2016-09-12) to rustc 1.13.0-nightly (6ffdda1ba 2016-09-14). When I switch back to the prior rustc, it works fine again. The issue is still present in the latest version of nightly: rustc 1.14.0-nightly (9c31d76e9 2016-10-03).

Unfortunately, I cannot share my microbenchmarks at the moment because it depends on a fairly large C++ library (I use FFI). I will try to get a toy example as soon as possible.

In the meantime, here's what my microbenchmark does. It is using the criterion crate to measure performance.

#![feature(test)]
extern crate criterion;
extern crate test;
extern crate rand;
extern crate libc;

use criterion::Bencher;
use rand::Rng;
use rand::ChaChaRng;
use libc;
use std::slice;

#[test]
fn bench_function_name() {
    fn bench_function_name(b: &mut Bencher) {
        let mut rng = ChaChaRng::new_unseeded();
        let mut x = [0u8; 1024];
        rng.fill_bytes(&mut x);

       // Below is what actually measures performance
       b.iter(|| { test::black_box(some_function(&x[..])); });
    }

    let mut criterion = criterion::Criterion::default();
    criterion.bench_function("bench_function_name", bench_function_name);
}

struct Answer<'a> {
  pub answer: &'a mut [u8],
}

impl <'a> Drop for Answer<'a> {
    fn drop(&mut self) {
        unsafe { cpp_buffer_free(self.answer.as_mut_ptr() as *mut libc::c_void); }
    }
}

pub fn some_function<'a>(query: &'a [u8]) -> Answer<'a> {
    let mut a_len: u64 = 0;

    let answer: &'a mut [u8] = unsafe {
        let ptr = some_cpp_function(query.as_ptr(), query_len() as u64, &mut a_len); 
        slice::from_raw_parts_mut(ptr as *mut u8, a_len as usize)
    };

    Answer { answer: answer }
}

extern "C" {
    fn some_cpp_function(q: *const libc::uint8_t, 
        q_len: libc::uint64_t, 
        a_len: *mut libc::uint64_t) 
        -> *mut libc::uint8_t;

    fn cpp_buffer_free(buffer: *mut libc::c_void); // this frees memory
}

In C++ land, this function takes the query (byte vector), does stuff to it, and returns an answer (a pointer to a byte vector). Once my app is done with the answer it calls a cpp function to garbage collect the byte vector pointer.

I run this code by invoking: cargo bench -- --test bench_function_name

Given the big difference in performance (from 5ms to 28ms), my guess is that the issue affects either: ~~the test or criterion crates~~, the libraries required to build and link against C/C++ (cmake, FFI, libc), or the way I'm using the Drop trait.

To link the library my build.rs has the lines:

let dst = cmake::Config::new("../lib_folder")
                         .define("CMAKE_BUILD_TYPE", "Release")
                         .build();

println!("cargo:rustc-link-search=native={}/directory_of_lib", dst.display());
println!("cargo:rustc-link-lib=static=lib_name");

Any thoughts? I'm happy to provide as much information as I can.

rust-lang / rust

Performance regression in nightly #36590