madsmtm / objc2

Bindings to Apple's frameworks in Rust
https://docs.rs/objc2/
MIT License
290 stars 35 forks source link

Make instance variables even simpler type-wise #414

Closed madsmtm closed 6 months ago

madsmtm commented 1 year ago

We use IvarDrop mostly because Box is #[fundamental], which means we can't ensure that no Encode impl exist for it (even though we know that would very likely be wrong).

Instead, we might be able to (ab)use HRTBs, see this playground link for the general idea. This is also what bevy uses for their IntoSystem trait.

Ideally we'd be able to do avoid IvarDrop, IvarEncode and IvarBool altogether, and just do (with or without the ivar names):

ivar1: Ivar<Box<i32>, "ivar1">,
ivar2: Ivar<i32, "ivar2">,
ivar3: Ivar<bool, "ivar3">,

Still need to figure out how to actually do that in a struct (playground).

madsmtm commented 1 year ago

Reminding myself why we can't add a zero-cost IvarAny<T>: drop(MyClass::alloc()) must be safe, which means that dealloc can be called before the class has actually initialized anything (and hence it will attempt to drop uninitialized data).

Maybe we could get by with just adding an extra, hidden IvarBool onto the class if any IvarAny<T> is present? And then we set that when initializing, and check it when deallocating? Though then we'd somehow have to control initializers a lot more, to ensure it is set automatically.

So maybe the "is initialized" flag has to be set on the IvarAny type itself? So it's actually enum IvarAny<T> { Allocated, Initialized(T) }, and then when calling Ivar::write we also change the state?

(This somewhat ties in with the drop flags that Rust has).

madsmtm commented 1 year ago

Todo: Figure out whether Swift uses drop flags on Objective-C compatible classes or not.

And if not, how Swift-created classes then are safely exposed to Objective-C which may allocate without initalizing, as well as how Swift handles unwinding through initializers.

madsmtm commented 1 year ago

A few Swift compiler details on this: https://github.com/apple/swift/pull/33743.

Doesn't explain what happens when Objective-C interop is enabled though.

madsmtm commented 1 year ago

Perhaps it would be best to revert the "instance variables should work transparently like struct fields" decision, since it seems brittle from a soundness perspective, and it introduces extra code in Deref which is really hard to debug, as well as possibly being a performance pitfall.

Instead, we could do something like:

declare_class!(
    struct MyClass;

    pub? struct MyClassIvars {
        ivar1: Cell<bool>,
        ivar2: AnyRustType,
    }

    unsafe impl ClassType for MyClass {
        type Super = NSObject;
        type Mutability = Mutable;
        const NAME: &'static str = "MyClass";
    }
);

// Generates
struct MyClass(NSObject);
// + Trait impls

impl Encode for MyClassIvars {
    const ENCODING: Encoding = {
        // Something calculated based on size/alignment of the ivars
    };
}

impl MyClass {
    const IVAR_NAME: &'static str = concat!("_", MyClass::NAME, "_ivars");
    static mut IVAR_OFFSET: isize = 0; // Will be set immediately after class creation

    pub? fn ivars(&self) -> &MyClassIvars { ... }
    pub? fn ivars_mut(&mut self) -> &mut MyClassIvars { ... }
}

// Usage:
let obj: Id<MyClass>;

// Loads the offset twice, though that _may_ be possible to optimize away
obj.ivars().ivar1.set(true);
obj.ivars().ivar2.any_rust_method();

// Guaranteed to be the most efficient
let ivars = obj.ivars();
ivars.ivar1.set(true);
ivars.ivar2.any_rust_method();

// The whole class is borrowed for the duration, but but disjoint access to the ivars is still possible
let mut ivars = obj.ivars_mut();
ivars.ivar1 = Cell::new(true);
ivars.ivar2.any_mutating_rust_method();

Advantages over having separate ivar1/ivar1_mut/ivar2/ivar2_mut methods:

Disadvantages:

Though we may be able to mitigate the first by providing helper methods? Or maybe users should be encouraged to provide those themselves?

madsmtm commented 11 months ago

Or perhaps even:

struct MyClassIvars {
    ivar1: Cell<bool>,
    ivar2: AnyRustType,
}

declare_class!(
    struct MyClass;

    type Ivars = MyClassIvars;

    unsafe impl ClassType for MyClass {
        type Super = NSObject;
        type Mutability = Mutable;
        const NAME: &'static str = "MyClass";
    }

    // ...
);

Since we don't actually care about the contents of the ivars (it can even be an enum, or something wrapped in e.g. a RefCell if you intend to always access your ivars through that).

madsmtm commented 11 months ago

Okay so it turns out that Swift doesn't even try to handle the case where a class is not fully initialized - the following crashes because it tries to dereference a null pointer:

import Foundation

class Bar {
    func bar() {
        print("Bar.bar")
    }
}

@objc public class Foo: NSObject {
    var bar = Bar.init()

    deinit {
        print("deinit")
        self.bar.bar() // null pointer deref here
    }
}
#import "ProjectName-Swift.h" // Or however your bridging headers are set up

int main(int argc, const char * argv[]) {
    Foo* foo = [Foo alloc]; // Allocate, but don't initialize the object
    return 0;
}

Now I'm wondering if we can get away with doing the same if we put more restrictions on initializers, e.g. after https://github.com/madsmtm/objc2/issues/440? Though it would require going through some hoops to get unwind safety (Swift doesn't do unwinding, it just traps).

madsmtm commented 11 months ago

Here is a playground I made to see if it was possible to special-case types like Box<T>, so that we use the niche in there for the drop flag. Unfortunately, I don't think it is.

madsmtm commented 11 months ago

We could also go the crazy route of:

Or perhaps even do method swizzling before calling release?

All of this would never be called in the usual cases where we don't unwind, so maybe swizzling could actually make sense?

madsmtm commented 11 months ago

Another option would be to override alloc itself, though that may also disable some optimizations.

madsmtm commented 11 months ago

Also I'm only just now realizing that our current calling of Drop in dealloc is unsound, that part also needs a drop flag.

We may be able to combine it with the drop flag for ivars though?

madsmtm commented 11 months ago

I think I will be going the drop-flag route, since it's the only clearly safe option - in general, code is allowed to allocate and deallocate without ever initializating, and the fact that Swift doesn't handle this should be considered a bug (or a safety hole, at the very least).

I will open an issue with them soon, once I reproduce the issue with the latest Swift version (if only to gain their opinion).

EDIT: Done, see https://github.com/apple/swift/issues/68734

madsmtm commented 8 months ago

I think we might be able to use autoref specialization to do things differently depending on whether Encode is implemented for the type or not.

madsmtm commented 8 months ago

I guess my biggest problem here is indecision:

I want to give users the ability to declare their ivars with the runtime, and use e.g. Option<Id<T>> efficiently and easily without the overhead of a drop flag.

But I also want people to not need to think about it, and just dumping everything in a single ivar is just likely to be more efficient in the general case as Rust doesn't have to load a static as often to figure out the ivar offsets.


Maybe there's a middle ground? Maybe we can keep the current design of letting people specify instance variables as-if they were struct members, and then we'd use autoref specialization to add an appropriate Encode whenever possible, and a "bag of bytes" Encode when not (avoiding the need to specify IvarEncode/IvarDrop/IvarBool).

Or maybe we add an attribute #[objc] that ensures that the ivar is accessible from Objective-C?


Also unsure if the performance overhead of loading that static is actually acceptable or not, and if not, if we should add accessor methods for each ivar instead? (Though if we do that, how do we add _mut methods?)

EDIT: I tried in this gist to see if LLVM can elide accesses to mutable statics that it "should" know are only set once, seems like it cannot.

But then again, neither can it in Objective-C. Compiling the following shows two loads of OBJC_IVAR_$_Abc.a, meaning that the extern call to -[NSObject hash] prevents the optimizer from combining those loads.

// clang -O2 -fobjc-arc -S -emit-llvm test_objective_c_ivar_access.m -o test_objective_c_ivar_access.ll
#include <AppKit/AppKit.h>

// Try replacing superclass with NSObject
@interface Abc: NSView {
    @public
    int a;
}
@end

// Uncomment to give the optimizer more information (e.g. that there are no private ivars)
// @implementation Abc
// @end

int foo(Abc* abc) {
    int first = abc->a;
    [abc hash];
    return first + abc->a;
}

So initially it seems like the performance characteristics would be exactly the same? Though that's not necessarily enough to say if we consider the performance characteristics acceptable!

Interestingly, there is actually an optimization for subclasses of NSObject (and maybe other classes where the full list of ivars is known?) where the static ivar load can be completely elided in favour of a constant inserted into the source code.

madsmtm commented 8 months ago

I looked a bit more at what Swift does for instance variables, and it seems like they don't emit type encoding information (they just use an empty string), though they do emit ivar names; though that may just be because it made their implementation easier?

Swift doesn't really expose the ivars to Objective-C, and even if you specify @objc on a variable, you'll just get a @property in the generated Objective-C header, not direct ivar access. Furthermore, Swift does not allow subclassing from Objective-C, with no way to opt-out, so they're intentionally being limiting in what you can access from Objective-C. (Actually, this small fact tells us that AppKit, Foundation etc. is likely still written mostly in Objective-C).

Anyhow, just yet another argument that perhaps it's okay that we don't expose instance variables in a nice way to Objective-C (rather, users should expose property getters/setters instead).

madsmtm commented 8 months ago

Okay, a few interdependent decisions to take:

  1. How do we access ivars?
    1. Methods: self.ivar() / self.ivar_mut()
    2. Fake struct fields: *self.ivar
  2. How do we initialize ivars?

    1. Individually:

      this.set_ivar1(value1);
      // Or the following, depending on (1)
      *this.ivar2 = value2;
      
      let this = this.set_initialized(); // May have to be unsafe?
    2. In one go:
      let this = this.set_ivars(Ivars {
          ivar1: value1,
          ivar2: value2,
      });
  3. How do we store ivars?

    1. Individually:
      declare_class!(
          struct MyClass {
              ivar1: MyType1,
              ivar2: MyType2,
          }
          ...
      );
    2. As one:

      struct Ivars {
          ivar1: MyType1,
          ivar2: MyType2,
      }
      
      declare_class!(
          struct MyClass;
          type Ivars = Ivars;
          ...
      );
  4. Do we allow instance variables with a specific name?

I think I'm pretty certain at this point what I want for decision 2: I want option ii, to store the instance variables in a single go, so that objc2 can statically know that they were all initialized, and so that it can set the any drop flags that may be needed; anything else is just too error prone!

Making this decision however does kinda drive the rest of our decisions, since we'll need some way to name the intermediate struct Ivars! We could generate a name for it automatically, but if we just pick option ii for decision 3, then we're set; so I'm heavily leaning towards that as well.

And leads us to decision 1, where the explicitness of option i is probably desirable now that there is only one method self.ivars()/self.ivars_mut() to call.

For decision 4, I guess we can still add some way to add instance variables with explicit names later on, but this does not really have to be integrated into the "normal" way of accessing instance variables, so I think we can honestly just punt on this as future work.


This does mean that if your only instance variables are only things like Option<Id<T>>, then you will still pay a performance penalty for the drop flag. I guess we could maybe remediate this by moving the struct Ivars definition into declare_class! and detecting that the instance variables are safe to drop after being created with mem::zeroed - will have to wait and see how difficult that'd be.

Or perhaps we could re-use bytemuck::Zeroable, and allow omitting the drop flag when that trait is implemented? This would not allow Id<T> to work directly, but that's probably fine, the fact that Id<T> can be safely dropped when it's created from mem::zeroed is kind of a hard implementation detail to expose.

madsmtm commented 8 months ago

Regarding deallocation of instance variables, unsure where that should actually happen here?

Under ARC and in Swift, it is done by adding a C++ destructor that objc_destructInstance calls (and has done since macOS 10.4), such that the instance variables are destroyed after all dealloc/deinit methods for the current object has been called.

This allows users to call methods in dealloc/deinit, knowing that even if a subclass overrides the method, the instance variables it uses will not be freed until after all deinitializers have been called.

I guess we could restrict Drop even more, but alternatively we should just do our ivar dropping inside a .cxx_destruct selector (this calling convention extern "C" fn(obj: NonNull<AnyObject>), weirdly enough no selector is passed, and unsure if it's allowed to unwind?)

I'm a bit weary about doing so, as I haven't found any official documentation that it'll remain supported? GCC does somewhat document it, and it is emitted in binaries, so I guess it'll be fine?

EDIT: We can't use .cxx_destruct, as that only works reliably when created in statics by a compiler.