rust-mobile / ndk

Rust bindings to the Android NDK
Apache License 2.0
1.11k stars 110 forks source link

ndk: Requre all off-thread `dyn Fn*` callback types to implement `Send` #455

Closed MarijnS95 closed 7 months ago

MarijnS95 commented 9 months ago

Fixes #454

In all these cases the NDK either documents or implements the callback to be invoked from a single, separate thread. For this to be allowed by Rust thread safety, the closures and their (moved) contents are effectively "moved" to a different thread (Send) but not accessed concurrently (Sync), but the type definitions were never requiring this marker trait which could lead to undefined/invalid behaviour.

In addition some Boxed callbacks may have been dropped before a new callback is passed to the NDK, leading to a possible race condition. By storing the new callback after registering it with the NDK, the previous callback is now dropped later to ensure such race conditions no longer happen (assuming AOSP callback invocations are properly synchronized with the setters).

A request has been filed to solidify this "behavioural contract" in the documentation where not done so yet: https://issuetracker.google.com/issues/300602767#comment12

MarijnS95 commented 8 months ago

@spencercw since you've been active around callbacks and seem to be using this crate, would you mind helping me out reviewing this and other PRs? I don't like "forcing" my own work into the codebase without at least one extra pair of eyes.

I've also been wondering why the NDK is mixing both Box<Box<dyn Fn...>> types together with F: Fn generics (and extern "C" fn native_func<F: Fn...>() {}), and whether some APIs should deregister their callbacks in drop() to prevent possible data races.

spencercw commented 8 months ago

Looks like a reasonable change.

I'm not actively using this right now (changed jobs), but don't mind being an extra pair of eyes from time to time.

I've also been wondering why the NDK is mixing both Box<Box<dyn Fn...>> types together with F: Fn generics

This does seem to be a little superfluous. I guess from an API point of view it means the API user doesn't need to box the callback themselves, but would be nice to be consistent across the codebase.

and whether some APIs should deregister their callbacks in drop() to prevent possible data races.

Tricky. I think this is probably not necessary, but it depends on the underlying NDK API. For example, AMediaCodec_setAsyncNotifyCallback says 'Once the callback is unregistered or the codec is reset / released, the previously registered callback will not be called.' So hopefully we can assume that once AMediaCodec_delete has returned, it is safe to drop the corresponding Rust callback state.

MarijnS95 commented 8 months ago

I'm not actively using this right now (changed jobs), but don't mind being an extra pair of eyes from time to time.

Thanks! That's unfortunate but I have quite a bunch of PRs open now. @rib appears to be inactive because of the newyears' holidays so I'll wait for that.

I've also been wondering why the NDK is mixing both Box<Box<dyn Fn...>> types together with F: Fn generics

This does seem to be a little superfluous. I guess from an API point of view it means the API user doesn't need to box the callback themselves, but would be nice to be consistent across the codebase.

I thought so too at first, until it hit me: fat pointers. The pointer coming from a Box<dyn Fnxxx> is a fat pointer with the vtable and storage for whatever is moved/referenced inside the closure:

let foo: Box<dyn Fn() -> u32> = Box::new(|| 1337);
let raw = dbg!(Box::into_raw(foo));
dbg!(std::mem::size_of_val(&raw));

Gives:

[src/main.rs:3] Box::into_raw(foo) = 0x0000000000000001
[src/main.rs:4] std::mem::size_of_val(&raw) = 16

(Note that nothing is captured, so there's no data associated with the dyn Fn)

So we'll need to box twice if we wish to pass an 8-byte pointer around, or find another solution such as using generics to convey the closure type (hence the box pointer becomes 8-bytes again, but the callback is also monomorphized).


Tricky. I think this is probably not necessary, but it depends on the underlying NDK API. For example, AMediaCodec_setAsyncNotifyCallback says 'Once the callback is unregistered or the codec is reset / released, the previously registered callback will not be called.' So hopefully we can assume that once AMediaCodec_delete has returned, it is safe to drop the corresponding Rust callback state.

Yeah I've been giving this some thought:

  1. For one-off callbacks (Choreographer seems to have some) we can Box::into_raw() in the callback. In the unlikely even that the callback is never called (there's no way to unregister it, and Choreographer is a "singleton" with a global "instance" and hence no impl Drop), we'll leak a box;
  2. For functions called more often, we'll hold on to the Box and destroy it on drop() (and eventually deregister it if necessary, as discussed before - but let's check docs first);
  3. If there's ever a structure that can be "cloned" (in this crate) via incrementing the NDK refcount, we suddenly no longer have a sensible place to track the "lifetime" of a boxed callback (but we'll investigate/solve that when the case appears, I don't think we have any ATM).

Either way, deregistering callbacks or having an explicit comment that the delete() causes the callback to no longer fire is fine by me; as long as we can safely get rid of the Box and invalidate the pointer.


Then secondly I've been thinking about a general lack of when and where callbacks are called, and what functions/objects/pointers are thread-safe, but that's for another time :)

MarijnS95 commented 8 months ago

Made an attempt to request some clarifications in https://issuetracker.google.com/issues/318944941.

Notice that it may be terrible for us to take an Fn instead of an FnMut. The latter allows interior mutability of captured variables and should not be used on concurrently-called closures AFAIK. However, I think it may be fine on these functions as they're moved to another thread and only invoked serially there.

Likewise I think that FnOnce doesn't require Sync on captured variables, only Send (same as std::thread::spawn()).

Thoughts?

spencercw commented 8 months ago

I think FnMut makes sense for AMediaCodec, with FnOnce for AChoreographer and ASurfaceTransaction. This does have the built-in assumption that these callbacks indeed are only called once, but the NDK documentation being what it is we might just have to live with this and hope at some point they improve it.

MarijnS95 commented 8 months ago

@spencercw I ended up opening a new issue #464 to once again go over all callbacks and document their (lack of) standardization, so that we have a guideline how to build our callbacks going forward.

On the topic of Box<Box<dyn Fn>> vs Box<T> with T: Fn, the latter can only be implemented for one-off callbacks that don't store their callback in the owning object.