Magnus overhead compared to rb-sys

Background

I'm benchmarking different ways of creating a large nested hash in Ruby. In my benchmark I compared an implementation using raw rb-sys with magnus-based implementation. To my surprise the latter seems to be over 2x slower. Is this expected? Perhaps I'm doing something wrong?

Benchmark results:

Calculating -------------------------------------
          Plain Ruby    206.215  (± 1.9%) i/s -      1.045k in   5.069316s
         C extension    314.509  (± 2.5%) i/s -      1.581k in   5.030353s
    rb-sys extension    323.220  (± 3.1%) i/s -      1.632k in   5.054636s
    Magnus extension    115.455  (± 5.2%) i/s -    580.000  in   5.035127s

Comparison:
    rb-sys extension:      323.2 i/s
         C extension:      314.5 i/s - same-ish: difference falls within error
          Plain Ruby:      206.2 i/s - 1.57x  slower
    Magnus extension:      115.5 i/s - 2.80x  slower

Plain Ruby = 4.85 ms
C extension = 3.18 ms
rb-sys extension = 3.10 ms
Magnus extension = 8.68 ms

Code

The code is also available in this repo:

Ruby:

PAYLOAD = "ABC"*100

def build_tree(depth)
  if depth == 1
    return {label: PAYLOAD.dup , children: []}
  end
  return {label: PAYLOAD.dup, children: [build_tree(depth-1), build_tree(depth-1)]}
end

def build_big_tree
  build_tree(13)
end

magnus implementation:

static PAYLOAD: &str = "ABC(...)";

fn build_tree(depth: i32) -> RHash {
    let result = RHash::new();
    result.aset(Symbol::new("label"), PAYLOAD).unwrap();
    let children = RArray::new();
    if depth != 1 {
        children.push(build_tree(depth - 1)).unwrap();
        children.push(build_tree(depth - 1)).unwrap();
    }
    result.aset(Symbol::new("children"), children).unwrap();
    return result;
}

fn build_big_tree() -> RHash {
    return build_tree(13);
}

rb-sys implementation:

static PAYLOAD: &str = "ABC(...)";

unsafe fn build_tree(depth: i32) -> VALUE {
    let result = rb_hash_new();
    let children = rb_ary_new();
    if depth != 1 {
        rb_ary_push(children, build_tree(depth - 1));
        rb_ary_push(children, build_tree(depth - 1));
    }
    rb_hash_aset(
        result,
        rb_id2sym(LABEL_INTERN),
        rb_str_new(PAYLOAD.as_ptr() as *mut _, PAYLOAD.len() as _),
    );
    rb_hash_aset(result, rb_id2sym(CHILDREN_INTERN), children);
    return result;
}

unsafe extern "C" fn build_big_tree(_: VALUE) -> VALUE {
    return build_tree(13);
}

Notes

I thought this might have something to do with building Ruby symbols, but using string keys in the hash doesn't affect the result much.

Hey, I put together a set of changes matching what the other versions are doing, you can find them here https://github.com/fpacanowski/ruby-extensions-benchmark/compare/master...matsadler:ruby-extensions-benchmark:master

The first change was to update the version of Magnus from 0.4 to 0.6. 0.6 has a number of optimisations.

The main reason your Magnus version was running slower because it wasn't doing quite the same thing as the other versions. Symbol::new allocates a full Garbage Collectable object version of a symbol. It's the equivalent to "foo".to_sym. I swapped it for Ruby::sym_new, which creates a StaticSymbol, which is a lighter weight non-GC-able symbol, the equivalent of a symbol literal in Ruby (e.g. :foo), and the equivalent to what you were doing with the C and rb-sys versions. I also made the change to only create the StaticSymbol once, then reuse it.

With this (at least on my machine) the Magnus version ends up faster than the C one. I think this is because rb_str_new_cstr function used in the C version has to count the length of the string with every call, where as Rust knows the length of the string already, so Magnus can use the rb_str_new function under the hood and pass the length in.

matsadler / magnus