iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 614 forks source link

[vm] iree_vm_wait_invoke with a callback and user-defined wait source #13125

Open ezhulenev opened 1 year ago

ezhulenev commented 1 year ago

Request description

Currently iree_vm_wait_invoke API is a blocking one (caller thread will be blocked until all wait sources become ready). Can we add a non blocking wait accepting a callback.

typedef void (*iree_vm_wait_callback)(iree_status_t status, void* user_data);

IREE_API_EXPORT iree_status_t
iree_vm_wait_invoke_async(
                    iree_vm_invoke_state_t* state,
                    iree_vm_wait_frame_t* wait_frame, 
                    iree_time_t deadline_ns, // keep it?
                    iree_vm_wait_callback callback,
                    void* user_data);

and also with a user-defined wait sources with a then callback like in std::experimental::future (https://en.cppreference.com/w/cpp/experimental/future/then)

(C++ interface)

class AsyncWaitSource{
public:
  void WhenReady(std::function<void()> callback);
};
ScottTodd commented 1 year ago

iree_vm_async_invoke has a callback mechanism (iree_vm_async_invoke_callback_fn_t) that runs on a iree_loop_t. Does that work for your use case? (See invocation.h source)

ezhulenev commented 1 year ago

I think it definitely can be used as one of the building blocks, but it currently ends up calling into iree_wait_source_wait_one (https://github.com/openxla/iree/blob/97779d7f494660f88864b035475ec77a1e54c6c8/runtime/src/iree/base/wait_source.h#L321-L326) which Blocks the caller and waits for a |wait_source| to resolve.. I can't find a "wake me up when you are ready" await primitive.

ScottTodd commented 1 year ago

Oops copy/paste error (edited) - I meant iree_vm_async_invoke has a callback mechanism, not iree_vm_wait_invoke.

ezhulenev commented 1 year ago

Sure, this callback is definitely will be required for async end-to-end execution, however to be able to write an async loop implementation that never blocks any threads iree_wait_source_async_... is missing. Or I can't find a mechanism that will allow to do it today.

benvanik commented 1 year ago

Everything already supports non-blocking main loops, but there's no turn-key implementation of them yet (and it'll always be optional). iree_vm_wait_invoke is the synchronous implementation provided as a helper but is not required - when trying to integrate at that level you can handle asynchronous behavior yourself as part of the bindings/integration as noted:

// Hosting schedulers that can more efficiently perform the wait should do so,
// either synchronously or asynchronously. Wait frames are stored on the stack
// and will remain valid until iree_vm_resume_invoke is used to complete the
// wait.

Instead of the hosting loop calling iree_vm_wait_invoke it can get the wait source from the wait frame, setup whatever kind of wait it wants in whatever way it wants, and then resume as possible. This allows for batching wait syscalls (epoll/io_uring/etc), integrating the waits into existing application loop/reactor mechanisms, etc.

At some point there's a branch I had that lets the loop run on the IREE task system which gives full async behavior - it does as mentioned above by getting the wait sources and adding them to a wait set that it manages a dedicated syscall thread for. That'll be turn-key but today if you want application-level async then you'll need to do that yourself.

The critical thing is that arbitrary user callbacks are not possible without additional cost (polling threads, etc) so we don't support those in the lower layers of the stack and instead provide only the primitives that work across APIs/platforms and that can be done with no surprise user costs. You can emulate callbacks yourself using events or any compatible wait source (in cases where the hosting layer has its own primitives already). See iree_event_t and iree/base/internal/event_pool.h (internal, but could be exposed if needed). It's important that compatible platform primitives are used in order to efficiently poll/wait in the kernel, even if the appearance through bindings make them appear like callbacks.

ezhulenev commented 1 year ago

So in my own loop implementation I should do something like:

for (iree_wait_source_t* src : wait_frame->wait_sources) {
  if (src->iree_wait_source_ctl_fn_t == MyOwnWaitSourceCtlFun) {
    auto* awaitable = reinterpret_cast<MyAwaitable*>(src->storage[0]);
    DoWhatEverIWant();
  }
}

iree_vm_resume_invoke(...);

or alternatively do the same inside a loop implementation.

And IREE_WAIT_SOURCE_COMMAND_WAIT_ONE command will just block the caller thread if awaitable is not ready, but the higher level stack should guarantee that once wait one is called, awaitable should be in ready state.

benvanik commented 1 year ago

yep! And if you handle the wait yourself and then resume then the wait_one should never be called by the runtime - or if it is it'd be with a timepoint of immediate for querying. Looking at the code nothing will call it today if the loop/you don't.

If you have a platform primitive not covered by iree_wait_primitive_type_t but optionally supportable we can integrate that, or otherwise you can do as you mention with your own control function/data.

ezhulenev commented 1 year ago

👍 I was just thinking of wait frames that might have multiple wait sources, and at "application level" I only know how to handle one of them.

I'm looking at integrating IREE VM with tfrt::AsyncValue (or PjRtFuture), to enable non-blocking continuations, and current plan is to pass a tfrt::AsyncValue* as self pointer, similar to semaphores implementation.

benvanik commented 1 year ago

I don't know why tfrt is involved, but good luck :) iree_task_poller_prepare_task shows how to handle wait sources to wait sets and using iree_wait_any

allieculp commented 1 year ago

@ezhulenev Keeping this open for discussion, please close when possible!