matsadler / magnus

Ruby bindings for Rust. Write Ruby extension gems in Rust, or call Ruby from Rust.
https://docs.rs/magnus/latest/magnus/
MIT License
682 stars 35 forks source link

Unable to create subclasses of wrapped objects in Ruby #67

Closed wfchandler closed 1 year ago

wfchandler commented 1 year ago

First off, thanks for making this project. The documentation is excellent, and it's been a good introduction to Ruby's internals as someone who hasn't dealt with them in the past.

I'm running into some odd behavior when subclassing a TypedData Rust object in Ruby. If I have a Rust-based superclass, Parent, and a sublass class defined in Ruby, Child, the object returned by Child.new will be a Parent. Child#initialize is not executed.

Is this expected?

This can behavior be replicated using the complete_object example in this repo with this patch:

diff --git a/examples/complete_object/lib/temperature.rb b/examples/complete_object/lib/temperature.rb
index 22440be..a1ea42b 100644
--- a/examples/complete_object/lib/temperature.rb
+++ b/examples/complete_object/lib/temperature.rb
@@ -1,3 +1,14 @@
 # frozen_string_literal: true

 require_relative "temperature/temperature"
+
+class Foo < Temperature
+  def initialize(temp)
+    puts "Initializing Foo"
+    super(celsius: temp)
+  end
+
+  def bar
+    puts "Hello from Foo"
+  end
+end
diff --git a/examples/complete_object/test/temperature_test.rb b/examples/complete_object/test/temperature_test.rb
index ceb5216..307e84c 100644
--- a/examples/complete_object/test/temperature_test.rb
+++ b/examples/complete_object/test/temperature_test.rb
@@ -5,6 +5,12 @@ require_relative "../lib/temperature"

 class TemperatureTest < Test::Unit::TestCase

+  def test_foo
+    foo = Foo.new(celsius: 20)
+    puts "Class of foo: #{foo.class}"
+    foo.bar
+  end
+
   def test_new
     assert { Temperature.new(kelvin: 292.65).to_kelvin == 292.65 }
     assert { Temperature.new(celsius: 19.5).to_celsius == 19.5 }

Running this results in:

$ bundle exec rake
Started
..Class of foo: Temperature
E
===========================
Error: test_foo(TemperatureTest): NoMethodError: undefined method `bar' for Temperature { microkelvin: 293150000 }:Temperature
/magnus/examples/complete_object/test/temperature_test.rb:11:in `test_foo'
      8:   def test_foo
      9:     foo = Foo.new(celsius: 20)
     10:     puts "Class of foo: #{foo.class}"
  => 11:     foo.bar
     12:   end
     13:
     14:   def test_new
==========================

Magnus version: HEAD (a42fdbef8f0a9726d15e24ed904e3e6bf2015351) Rust version: rustc 1.67.1 (d5a82bbd2 2023-02-07) Ruby versions: tested with ruby 3.0.5p211 (2022-11-24 revision ba5cf0f7c5) [arm64-darwin22] and ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]

matsadler commented 1 year ago

Hey, thanks for this bug report and thanks for the kind words. I'm glad the project is useful.

So this (subclassing a Rust struct from Ruby) wasn't a use case I had considered, so Magnus doesn't really support it right now.

In Ruby creating a new object has two steps, first the object is allocated, and then it is initialised. Both operations are done by the new class method, so it's easy to not realise it's a two step process. If it were written in Ruby, the new method would look like:

class Class
  def new(*args)
    instance = allocate
    instance.initialize(*args)
    instance
  end
end

This shows the influence of C on Ruby, as it mirrors how you'd do things in C. First you'd allocate some memory, then you'd write your data to that memory.

Rust doesn't work like that. In Rust allocating and initialising a struct are a single operation, e.g. given a struct like:

struct Point {
    x: i64,
    y: i64,
}

You allocate & initialise it with:

let point = Point { x: 5, y: 10 };

So Magnus carries this over to wrapped structs. All the examples have you defining the new method directly, overwriting Ruby's default that had the separate allocate & initialize steps. Then the new method just returns the instance, without ever calling allocate or initialize. When a subclass inherits the new method that method isn't paying any attention to the fact that it's being called on a subclass and creates an instance of the parent class.

Ruby's C apis do allow defining separate allocate and initialize functions, leaving the default new in place, but Magnus doesn't currently expose those apis.

If you really want to do this right now with the current release version of Magnus (0.5) you can paper over the cracks with rb_sys, which is a lower level binding to Ruby. Here's the mut_point example adapted:

use std::cell::RefCell;

// this requires enabling the `rb-sys-interop` in Cargo.toml, like:
//     magnus = { version = "0.5", features = ["rb-sys-interop"] }
use magnus::{
    class, define_class, embed, eval, method, prelude::*, rb_sys::AsRawValue, typed_data,
    wrap, Error, TypedData,
};
// must add `rb_sys` as a dependency. Use "*" as version to match whatever
// Magnus is using.
//     rb-sys = "*"
use rb_sys::{rb_define_alloc_func, rb_obj_reveal};

struct Point {
    x: isize,
    y: isize,
}

#[wrap(class = "Point")]
struct MutPoint(RefCell<Point>);

impl MutPoint {
    // Allocator function to produce a new 'uninitialised' MutPoint.
    //
    // Argument is a Ruby VALUE (pointer to an object) for the class that is
    // being allocated.
    //
    // Return value is a Ruby VALUE (pointer to an object) of the uninitialised
    // object.
    extern "C" fn alloc(class: u64) -> u64 {
        // create the Rust struct. The idea of uninitialised data doesn't
        // really make sense in Rust, so we'll just use default values.
        let point = Self(RefCell::new(Point { x: 0, y: 0 }));
        // wrap the struct as a Ruby object then get the raw Ruby VALUE pointer
        let instance = typed_data::Obj::wrap(point).as_raw();
        // Magnus' wrap function (which is the only way of wrapping data with
        // and object in Magnus < 0.6) always uses the default class, we need
        // to update that to the passed class, incase it has been subclassed in
        // Ruby.
        unsafe { rb_obj_reveal(instance, class) };
        // Return the instance
        instance
    }

    fn initialize(&self, x: isize, y: isize) {
        let mut this = self.0.borrow_mut();
        this.x = x;
        this.y = y;
    }

    fn x(&self) -> isize {
        self.0.borrow().x
    }

    fn set_x(&self, val: isize) {
        self.0.borrow_mut().x = val;
    }

    fn y(&self) -> isize {
        self.0.borrow().y
    }

    fn set_y(&self, val: isize) {
        self.0.borrow_mut().y = val;
    }

    fn distance(&self, other: &MutPoint) -> f64 {
        (((other.x() - self.x()).pow(2) + (other.y() - self.y()).pow(2)) as f64).sqrt()
    }
}

fn main() -> Result<(), Error> {
    let _cleanup = unsafe { embed::init() };

    let class = define_class("Point", class::object()).unwrap();

    // Let Ruby know about our allocator function
    unsafe {
        rb_define_alloc_func(
            // call `class()` from the `TypedData` trait *before* defining
            // our allocator, because in Magnus < 0.6 the default
            // implementation of `TypedData::class()` undefs the allocator
            // function on first call. So we want to get that out of the way
            // before defining the allocator function.
            // It just so happens that `rb_define_alloc_func` wants the same
            // class as an arg, so we can kill two birds with one stone.
            <MutPoint as TypedData>::class().as_raw(),
            Some(MutPoint::alloc),
        )
    };

    class.define_method("initialize", method!(MutPoint::initialize, 2)).unwrap();
    class.define_method("x", method!(MutPoint::x, 0)).unwrap();
    class.define_method("x=", method!(MutPoint::set_x, 1)).unwrap();
    class.define_method("y", method!(MutPoint::y, 0)).unwrap();
    class.define_method("y=", method!(MutPoint::set_y, 1)).unwrap();
    class.define_method("distance", method!(MutPoint::distance, 1)).unwrap();

    let d: f64 = eval(
        "class OffsetPoint < Point
           def initialize(offset, x, y)
             super(x + offset, y + offset)
           end
         end
         a = Point.new(1, 1)
         b = OffsetPoint.new(2, 3, 3)
         a.distance(b)",
    ).unwrap();

    println!("{}", d);
    Ok(())
}

I will add a way to expose this to Magnus, but it'll take a bit of thought figuring out how I'd like to do it, and didn't want to keep you waiting.

I'll keep this issue open, and I'll update it when I have a design that I think will work.

SlashScreen commented 1 year ago

Came here to mention this as well, but noticed an issue had been opened.

SlashScreen commented 1 year ago

I also want to ask if there is/will be more advanced inheritance functionality, such as self << class.

wfchandler commented 1 year ago

Thanks for the detailed explanation @matsadler!

matsadler commented 1 year ago

I also want to ask if there is/will be more advanced inheritance functionality, such as self << class.

I'm going to guess you meant class << self (I make the same mistake all the time).

class << self isn't really inheritance. It changes the current self to the 'singleton class'/'meta class' of the value to the right of <<. In a class definition this will be the singleton class of the class being defined (this video does a great job explaining Ruby's object model and what the singleton class is, if this is new to you).

This is usually used like:

class Example
  class << self
    def class_method
      "example"
    end
  end
end

Example.class_method   #=> "example"

which is equivalent to:

class Example
  def self.class_method
    "example"
  end
end

Example.class_method   #=> "example"

In magnus this common usage would be:

fn class_method() -> String {
    "example".to_owned()
}

let class = define_class("Example")?;
class.define_singleton_method("Example", function!(class_method, 0));

or you could use this pattern, which would adapt better to other uses of class << self:

fn class_method() -> String {
    "example".to_owned()
}

let class = define_class("Example")?;
class.singleton_class()?.define_method("Example", function!(class_method, 0))?;
matsadler commented 1 year ago

I thought it might take a little longer, but I have something working.

You can see an example with an initialize method just like a regular Ruby class in the "complete_object" example.

And there's another example of sticking with defining new directly and skipping initialize, but correctly setting the class, in the typed_data_subclass_from_ruby test.

These changes will go out with the 0.6 release, which will probably be in a month or so.

I'll leave this issue open a little longer incase you want to give this a try and if you have any feedback. I'll close it in a week or so if I don't hear anything.

Thanks again for raising this issue!

wfchandler commented 1 year ago

I tried out the first approach and it works in my project. Thanks for the quick turnaround!

Re feedback, both approaches seem a little fiddly with needing to set the allocator or inherited methods. Automagically providing this would be nice, but it sounds like there may not be a way to do this that covers all use-cases.

SlashScreen commented 1 year ago

I also want to ask if there is/will be more advanced inheritance functionality, such as self << class.

I'm going to guess you meant class << self (I make the same mistake all the time).

class << self isn't really inheritance. It changes the current self to the 'singleton class'/'meta class' of the value to the right of <<. In a class definition this will be the singleton class of the class being defined (this video does a great job explaining Ruby's object model and what the singleton class is, if this is new to you).

This is usually used like:

class Example
  class << self
    def class_method
      "example"
    end
  end
end

Example.class_method   #=> "example"

which is equivalent to:

class Example
  def self.class_method
    "example"
  end
end

Example.class_method   #=> "example"

In magnus this common usage would be:

fn class_method() -> String {
    "example".to_owned()
}

let class = define_class("Example")?;
class.define_singleton_method("Example", function!(class_method, 0));

or you could use this pattern, which would adapt better to other uses of class << self:

fn class_method() -> String {
    "example".to_owned()
}

let class = define_class("Example")?;
class.singleton_class()?.define_method("Example", function!(class_method, 0))?;

gotcha. one more question that is, in fact, related to this- how am I to call a Proc? I tried for hours to try to get the arguments to work, to no avail, even when copying the test file code. How do I pass in a list of arguments? And, how can I ignore the return value, as it seems I need a tuple of two of them?

matsadler commented 1 year ago

Re feedback, both approaches seem a little fiddly with needing to set the allocator or inherited methods. Automagically providing this would be nice, but it sounds like there may not be a way to do this that covers all use-cases.

Yeah, that's fair. I think the complexity here is inherent with how Ruby, C, and Rust don't quite line up, but I'll have a think on how I could package the different options up as functions/macros/whatever, rather than users having to write out the same boilerplate code over and over.

matsadler commented 1 year ago

how am I to call a Proc? I tried for hours to try to get the arguments to work, to no avail, even when copying the test file code. How do I pass in a list of arguments? And, how can I ignore the return value, as it seems I need a tuple of two of them?

Oops, I'd failed to document this, I've added docs now, see https://github.com/matsadler/magnus/commit/0a43f5541fc45c991e93f5b07adfcd4366baa984.

You can use a tuple (e.g. (foo, bar)), slice (e.g. &[foo, bar]), array (e.g. [foo, bar; 2]), or Ruby array (e.g. RArray::from_vec(vec![foo, bar]) as the arguments list to Proc::call.

The values in a tuple can be different types, and Rust types will automatically be converted to Ruby types. With slice and array the values have to be the Ruby Value type. RArray is very flexible with how you can build it. Under the hood they are all converted to an RArray, so should be the same performance wise.

One edge case with tuples is to pass a single argument you need a trailing comma for Rust to know it's a tuple of one, rather than just parens, so (foo,).

The return type is generic, so the Rust compiler needs a type for it (unfortunately there's no way for me to specify a default), you can use let _: Value = some_proc.call((foo, bar)).unwrap() to ignore the return value. Using Value will skip any type conversion that would happen for other types, so no danger of the return type conversion erroring.

uvlad7 commented 11 months ago

both approaches seem a little fiddly with needing to set the allocator or inherited methods

I'd like to provide some feedback too and say that the first approach actually lines up pretty well with how Ruby C API works: one needs to call rb_define_alloc_func and provide a function that calls ALLOC and Data_Wrap_Struct.

@matsadler maybe change "Wrapping Rust Types in Ruby Objects" section in the README? Or add a comment about broken subclassing and a link to this issue?