rust-diplomat / diplomat

Experimental Rust tool for generating FFI definitions allowing many other languages to call Rust code
https://rust-diplomat.github.io/book/
Other
498 stars 46 forks source link

Leverage ECMAScript Explicit Resource Management #256

Open sffc opened 2 years ago

sffc commented 2 years ago

The ECMAScript Explicit Resource Management is soon to start becoming available. We should consider using it in the Diplomat-generated JS/WASM bindings.

https://github.com/tc39/proposal-explicit-resource-management

Basically, you add a Symbol.dispose function to an object that gets called when the object goes out of scope. In order to enable this, the variable needs to be declared as using x = ... instead of let x = ... or const x = ....

I suggest that we add Symbol.dispose functions that free the Rust memory. We can still put the object in the finalization registry, too, as a last resort in case people forget to say using at the variable declaration. (If Symbol.dispose gets run, it also removes the object from the finalization registry.)

sffc commented 1 year ago

Update: Explicit Resource Management reached Stage 3 today.

Manishearth commented 1 year ago

I'm kinda wary of techniques that rely on explicit usings. I think making it an option during generation could work, but the tricky thing about ICU4X objects is that a lot of them are long lived and this proposal seems to be scope-based so it won't work for that (especially if e.g. an application wants to be able to stick a formatter in a global, but occasionally swap it out for another one)

Manishearth commented 1 year ago

Actually I guess we can unconditionally do both and that's fine. Bit wary of using being the recommended way people use this, though. We can get through this with the appropriate documentation.

Manishearth commented 1 year ago

Oh, actually, a super tricky part of this is the lifetime handling, what if someone constructs an object with using and then constructs another one that, unbeknownst to them, contains an internal borrow of that object using Quinn's lifetime handling. If the second one is returned out, dispose() would still end up getting called on the first, leading to a memory error.

We have a couple places where we're we're using borrows and will probably gain more over time. I suspect a situation where using is only allowed for types that never participate in borrows could work, but that seems tricky and it doesn't seem like the proposal forbids using using for things that don't have a dispose function. Educating users on correct usage of using here seems hard.

sffc commented 1 year ago

This seems no different than what happens in C++ when one object borrows from another; you just need to be a little extra careful. I agree it's a sharp edge though.

Can we leverage reference counting? The dispose() function decreases the ref count, and child objects also increase the ref count, and the child object decreases the parent's ref count in its own dispose() function.

CC @QnnOkabayashi

sffc commented 1 year ago

the tricky thing about ICU4X objects is that a lot of them are long lived and this proposal seems to be scope-based so it won't work for that

A key part of the proposal is a DisposableStack where you can stick disposable things if you don't want them to be scope-based. It's useful for global and class-based scopes.

Manishearth commented 1 year ago

This seems no different than what happens in C++ when one object borrows from another; you just need to be a little extra careful. I agree it's a sharp edge though.

Yes, but C++ users expect segfaults when they do this, JS users do not.

Can we leverage reference counting? The dispose() function decreases the ref count, and child objects also increase the ref count, and the child object decreases the parent's ref count in its own dispose() function.

Seems like additional overhead and basically building a GC on top of a GC. We could, I'd like it to be an option, and I don't see much benefit given that the finalization registry exists.

sffc commented 1 year ago

I don't see much benefit given that the finalization registry exists

The benefit is fairly huge, given that the finalization registry has been causing problems since it runs at random times and we can't hint it to run more often to release large objects. I'd love to move away from the finalization registry as our primary memory management strategy in JS.

Manishearth commented 1 year ago

As far as I can tell the finalization registry is the only fully reliable way to do so without memory issues, however.

using seems to require that the user hold it correctly to get it to work. For RAII patterns like files and mutexes that seems totally fine, but for memory management I can't see this working in an airtight way: there will always be a way to smuggle out a reference, and those ways are not hard to do by accident. We can use refcounting for the cases where our APIs want to smuggle out a reference, but this doesn't work for the user doing it themselves, and I would like to be in a situation where ICU4X over JS does not have memory issues.

I suspect a useful bit of feedback for the using design would be for it to have a way to spot-check GC ownership, For things like this I see it as a potentially useful thing if paired with the finalization registry, where the dispose() function has a way of checking if it's ready to be disposed (falling back to relying on the registry if it has to). I suspect for most GC impls due to GC nurseries this will actually not be expensive, basically an "am I still in the nursery" check.

The current design seems well-suited for RAII resources, but not RAII memory, because JS has its own memory management already.

sffc commented 1 year ago

For what it's worth, it was discussed that trying to leak a reference (e.g. through a closure) would be rejected, but that idea was dismissed; you can see the arguments in https://github.com/tc39/proposal-explicit-resource-management/issues/97. Unfortunately the proposal is now at Stage 3 which means that the comment period for larger changes is now closed; the opportunity to raise design feedback was in the preceding months.

In our case, we can make the objects go to a state where they throw an exception if you try to use them after the WASM-side memory is freed. This is totally a reasonable thing to do and consistent with how explicit resource management works for the more "normal" use cases like file handles.

It's easy (and part of my original proposal) to put these in both the finalization registry and explicit resource management. Whichever dispose gets called first will put the object in a "disposed" state, such that if dispose gets called again, it will be a no-op.

ICU4X clients can already create memory issues for themselves if they use WASM directly instead of through our wrappers (as one of our only clients so far is doing; see my chat message).

Explicit Resource Management cannot increase the space of memory issues, since our wrappers are the only thing that currently enforce it, and we can make our wrappers have the correct behavior.

Manishearth commented 1 year ago

In our case, we can make the objects go to a state where they throw an exception if you try to use them after the WASM-side memory is freed. This is totally a reasonable thing to do and consistent with how explicit resource management works for the more "normal" use cases like file handles.

Yeah, we could do this provided we're also tracking borrows correctly. I think it's a lot more overhead.

I'm ultimately fine with this as an option, I'm wary of making it the main path.

ICU4X clients can already create memory issues for themselves if they use WASM directly instead of through our wrappers (as one of our only clients so far is doing; see my chat message).

Yes but that's a much more explicit opt in than "oops I persisted a reference" which is super easy to do in JS.

Explicit Resource Management cannot increase the space of memory issues, since our wrappers are the only thing that currently enforce it, and we can make our wrappers have the correct behavior

I disagree: you're right that it doesn't affect the theoretical space of issues, but it can affect the practical space of issues since there will be things that are more likely to accidentally hit.

sffc commented 1 year ago

Doesn't reference counting solve the borrowing/lifetime issues? I don't see how you can hit a memory issue if we implement reference counting. Thanks to Quinn's work, we know exactly when to increment and decrement the reference count. It's very low overhead.

Manishearth commented 1 year ago

@sffc that's only for references which escape into ICU4X APIs, that doesn't cover escaping the scope of the using block via a closure or being stashed in an object or whatever. DisposableStack helps but it's optional.

For this to be memory safe, you need both refcounting and validity checks, but even in that world you can easily end up with a "use after free" that's perfectly memory safe but causes a thrown exception because we disposed an object whilst a different part of the program still intended to use it. This is wasm after all so "memory unsafe" is basically also just an exception most of the time, or sometimes a broken read from wasm memory, which won't contain that many interesting things anyway.

My experience with defer/using style RAII in GC'd languages (like Go) is that it's great for the occasional resource that needs to be scope-managed, and files/mutexes are often that use case. However it's terrible for cases where everything needs to be managed; and that's where ICU4X is: all objects must be managed. It's a nice opt in to have when you know something is scopebound but I don't actually see that as being super common.

sffc commented 1 year ago

It's a programmer error if they use an object under explicit resource management after it left scope; we should throw an exception, but I also fully expect that once the ecosystem catches up, this will be an ESLint error. TC39 considered this case (see link above) and decided that runtime errors are fine here. It's possible that TypeScript may enforce this, too.

The most likely way I see these things potentially being leaked is if they get saved on objects that are managed by some reactive framework like React or Angular or Vue. Again, I expect the ecosystem to catch up such that those frameworks support adoption of Disposables (internally using DisposableStack). Alternatively, it's easy to document that if you need the object to be longer lasting, just don't write using.

But it sounds like we agree that we should document both approaches (finalization vs disposable) and let programmers pick. We can discuss the nuance later.

Manishearth commented 1 year ago

That's reasonable, especially with eslint.