experiment with permit based service framework

hlbarber commented 11 months ago

Overview

burger is an experimental permit based service API.

pub trait Service<Request> {
    type Response;
    type Permit<'a>
    where
        Self: 'a;

    async fn acquire(&self) -> Self::Permit<'_>;

    async fn call(permit: Self::Permit<'_>, request: Request) -> Self::Response;
}

I published a few example implementations in this repository. This API is possible (migration here) without async fn in traits and just GATs. It requires a lot of pin projection madness and extra GATs.

The purpose of the issue is to collect critiques to inform the design of tower.

Motivating Questions

Why use permits?

Permits allow you to disarm a service after it's ready and can be used to enforce a tighter service contract.

Why doesn't `call` accept `&self`?

The readiness of one service does not ensure the readiness of a different service of the same type - we want to disallow sharing of permits. There are three options here:

Pass the innards required for call from &self to the permit.
Use some sort of branding. This adds a lot of complexity.
Ignore the problem - service authors can implement runtime checks to prevent sharing if they really care.

Choosing 1 is safe and less obscure than 2.

Why does `call` take ownership of the permit?

A permit should allow only one call.

Why is `Service::Permit<'a>` a GAT?

We want to be able to pass the innards of &self into the Self::Permit<'_> by reference. Cloning Arcs from the &self to Self::Permit will result in poor performance and developer experience.

Why does `fn acquire` accept `&self` rather than `&mut self`?

If it accepted &mut self we would only ever be able to obtain one permit at a time.

Why `async fn acquire` rather than `fn acquire` like `tower::Service::poll_ready`?

Both approaches boil down to the same kind of state machines eventually. Using Future allows for easy composition with the large Futures ecosystem and with other Service::acquire calls.

Why do `async fn acquire` and `async fn call` not return a `Result`?

Most of the Service style combinators work without Result.

If the user wants to write a Service with a fallible async fn acquire then they can model the permit as a Result and have call return the Err. If the user wants to write an infallible acquire and a fallible call the signatures are no longer coupled by convention alone.

Perhaps the value of acquire returning a Result outweighs the flexibility though.

Split this into two traits?

We could split the Service trait into Acquire and Call where Call is implemented on the permit and has async fn call(self, request: Request). I have no strong opinions on this. Maybe this helps with object safety?

hlbarber commented 11 months ago

I wasn't aware at the time, making a note of it now - if we did split this trait in two it would become close to the suggestion by @olix0r https://github.com/tower-rs/tower/issues/626#issuecomment-1009256748.

hlbarber commented 11 months ago

Referencing related discussions:

LegNeato commented 11 months ago

How does this interact with drop and cancelation? Is it better or worse than the current model? This reminds me a lot of completion based io for some reason (https://www.ncameron.org/blog/async-io-with-completion-model-io-systems/).

hlbarber commented 11 months ago

How does this interact with drop and cancelation? Is it better or worse than the current model?

I don't think the design here addresses the lower-level problems relating to async drop if that's what you mean, but it does address the disarm problem.

I like to think about Service::acquire as a generic version of Semaphore::acquire.

Under current contract, the Service::poll_ready documentation states:

Note that poll_ready may reserve shared resources that are consumed in a subsequent invocation of call. Thus, it is critical for implementations to not assume that call will always be invoked and to ensure that such resources are released if the service is dropped before call is invoked or the future returned by call is dropped before it is polled.

And citing OP of the disarm thread:

Currently if poll_ready returns Ready it effectively reserves something (for instance a semaphore token). This means you must be following up with call next. The only other option is to Drop the service which however is not always possible.

The implementation here solves this problem because you can deallocate shared resource prior to Service:call in the Drop implementation of Service::Permit. Under this approach, it's natural to hold a handle to a resource in the permit to allow access during Service::call.

hlbarber commented 11 months ago

I've now implemented a decent percentage of the existing tower middleware and published it. Here are some obvious and subtle obstructions I've observed.

Problems common to all tower "async fn in trait" designs:

The async fn ": Send" problem.
Traits with async fn methods are not object safe.

The following are specific to Service::Permit<'a> being a GAT:

https://github.com/rust-lang/rust/issues/100013 when using the API with tokio::spawn.
tokio::sync::Semaphore has an acquire accepting &self and a acquire_owned accepting self: Arc<Self>. Similarly, RwLock::read vs RwLock::read_owned etc. Although we have opted to emulate the borrowed (Semaphore::acquire/RwLock::read/Mutex::lock) when designing Service::acquire, we should have a wrapper, Leak<S>, constructed via an Arc<S>, whose acquire returns an "owned" permit (that is Service::Permit: 'static). It seems like the only way to do this (in our very general case) is some self-referential hackery?
The requirement of where Self: 'a on Service::Permit means that wrappers can't do HRTBs such as for<'a> S: Service<Request, Permit<'a> = T> etc without requiring that S: 'static due to missing implicit bounds. See discussion in https://github.com/rust-lang/rust/issues/87479 and the blog post: https://sabrinajewson.org/blog/the-better-alternative-to-lifetime-gats.

tower-rs / tower