matsadler / magnus

Ruby bindings for Rust. Write Ruby extension gems in Rust, or call Ruby from Rust.
MIT License
682 stars 35 forks source link

Magnus overhead compared to rb-sys #99

Closed fpacanowski closed 10 months ago

fpacanowski commented 10 months ago


I'm benchmarking different ways of creating a large nested hash in Ruby. In my benchmark I compared an implementation using raw rb-sys with magnus-based implementation. To my surprise the latter seems to be over 2x slower. Is this expected? Perhaps I'm doing something wrong?

Benchmark results:

Calculating -------------------------------------
          Plain Ruby    206.215  (± 1.9%) i/s -      1.045k in   5.069316s
         C extension    314.509  (± 2.5%) i/s -      1.581k in   5.030353s
    rb-sys extension    323.220  (± 3.1%) i/s -      1.632k in   5.054636s
    Magnus extension    115.455  (± 5.2%) i/s -    580.000  in   5.035127s

    rb-sys extension:      323.2 i/s
         C extension:      314.5 i/s - same-ish: difference falls within error
          Plain Ruby:      206.2 i/s - 1.57x  slower
    Magnus extension:      115.5 i/s - 2.80x  slower

Plain Ruby = 4.85 ms
C extension = 3.18 ms
rb-sys extension = 3.10 ms
Magnus extension = 8.68 ms


The code is also available in this repo:



def build_tree(depth)
  if depth == 1
    return {label: PAYLOAD.dup , children: []}
  return {label: PAYLOAD.dup, children: [build_tree(depth-1), build_tree(depth-1)]}

def build_big_tree

magnus implementation:

static PAYLOAD: &str = "ABC(...)";

fn build_tree(depth: i32) -> RHash {
    let result = RHash::new();
    result.aset(Symbol::new("label"), PAYLOAD).unwrap();
    let children = RArray::new();
    if depth != 1 {
        children.push(build_tree(depth - 1)).unwrap();
        children.push(build_tree(depth - 1)).unwrap();
    result.aset(Symbol::new("children"), children).unwrap();
    return result;

fn build_big_tree() -> RHash {
    return build_tree(13);

rb-sys implementation:

static PAYLOAD: &str = "ABC(...)";

unsafe fn build_tree(depth: i32) -> VALUE {
    let result = rb_hash_new();
    let children = rb_ary_new();
    if depth != 1 {
        rb_ary_push(children, build_tree(depth - 1));
        rb_ary_push(children, build_tree(depth - 1));
        rb_str_new(PAYLOAD.as_ptr() as *mut _, PAYLOAD.len() as _),
    rb_hash_aset(result, rb_id2sym(CHILDREN_INTERN), children);
    return result;

unsafe extern "C" fn build_big_tree(_: VALUE) -> VALUE {
    return build_tree(13);


I thought this might have something to do with building Ruby symbols, but using string keys in the hash doesn't affect the result much.

Related discussion:

matsadler commented 10 months ago

Hey, I put together a set of changes matching what the other versions are doing, you can find them here

The first change was to update the version of Magnus from 0.4 to 0.6. 0.6 has a number of optimisations.

The main reason your Magnus version was running slower because it wasn't doing quite the same thing as the other versions. Symbol::new allocates a full Garbage Collectable object version of a symbol. It's the equivalent to "foo".to_sym. I swapped it for Ruby::sym_new, which creates a StaticSymbol, which is a lighter weight non-GC-able symbol, the equivalent of a symbol literal in Ruby (e.g. :foo), and the equivalent to what you were doing with the C and rb-sys versions. I also made the change to only create the StaticSymbol once, then reuse it.

With this (at least on my machine) the Magnus version ends up faster than the C one. I think this is because rb_str_new_cstr function used in the C version has to count the length of the string with every call, where as Rust knows the length of the string already, so Magnus can use the rb_str_new function under the hood and pass the length in.