Async Diesel middleware and example

colinbankier commented 5 years ago

The current "under development" Diesel middleware provides a "blocking" api to use Diesel. Some previous work was done (and abandoned) to provide a worker CpuPool middleware to allow not blocking request handling when using diesel.

I have been experimenting with continuing this, but with a different approach. Although Diesel doesn't directly support async, we can still build an async application around it using tokio_threadpool::blocking. This removes the need to explicitly manage a separate CpuPool, and looks like it will be a simpler implementation and api.

I'm opening this issue to solicit discussion around this approach and coordinate efforts in pushing Diesel support forward. The new experimental api so far provides a repo abstraction that handles the DB connection pool and wraps tokio_threadpool::blocking. Usage looks something like:

    let repo = Repo::borrow_from(&state);
    let user = .....;
    // Returns a future able to be chained into a `HandlerFuture`
    repo.run(move |conn| {
        diesel::insert_into(users::table)
            .values(&user)
            .get_result(&conn)
    })

Previous work this would supersede includes: https://github.com/gotham-rs/gotham/pull/227 https://github.com/gotham-rs/gotham/issues/77 https://github.com/gotham-rs/gotham/pull/126

whitfin commented 5 years ago

This is definitely a better approach IMO, but my only concern is that I'm not sure we should really push Diesel that much if it truly won't support async... if we push synchronous database calls into a Gotham environment, the whole thing is going to run much slower.

I mean, people are opting into that by using Diesel, but I just want to make sure we make it clear (even in the API) that that's what they're doing.

colinbankier commented 5 years ago

I think that using a relational DB from a web app is so common, and Diesel is the defacto standard in rust to do that, Gotham would be lacking not to provide good support and examples for it.

Also, correct me if I misunderstand what tokio_threadpool::blocking gives us - but for most purposes I understand it gives a close enough approximation of being truly async - with a few caveats.

It won't block the tokio event loop handling other requests
The default concurrent max blocking sections is 100 - which seems ample for many use-cases - and can be increased if needed.
There may be some slight overhead of tokio managing OS threads to handle this, but it's done automatically and we don't need to think about it.

It doesn't seem that Diesel natively supporting async is anywhere in sight. Are there drawbacks to the proposed approach that I'm missing?

whitfin commented 5 years ago

Sure, I don't disagree with that! Just that I'm not entirely sure how we go about this.

I don't know tokio_threadpool explicitly. If it's just a thread pool to run the Diesel stuff on with an async-looking API, then yeah, that seems fine (I think). I misunderstood and thought that it was placing the calls on the main Tokio runtime.

jdno commented 5 years ago

Thanks @colinbankier for working on Diesel support. I've spent the weekend exploring Gotham a bit, and like it a lot. I did get blocked eventually, though, when I tried to work with a database. An official middleware, or a new example would be much appreciated.

Is there anything I can do to help?

colinbankier commented 5 years ago

Great to hear this would be useful @jdno. I have a WIP branch here https://github.com/colinbankier/gotham/tree/diesel-tokio-blocking-middleware with example https://github.com/colinbankier/gotham/tree/diesel-tokio-blocking-middleware/examples/diesel. It pretty much just needs some tests and documentation filled in before opening a PR - hopefully I can find time to do that this week. If you wanted to try out the branch in the meantime feel free - and thoughts so far always appreciated.

jdno commented 5 years ago

Awesome, I'll check out the branch somewhere this week. Maybe I can contribute something to the documentation.

jdno commented 5 years ago

I found some time today to look at your code and the examples, and refactor my app to use the new middleware. The proof of concept endpoint I've built is a simple health check that prints the current environment (e.g. development or production), and does a database query. With the async middleware, this is how my handler looks now:

pub fn check(state: State) -> Box<HandlerFuture> {
    let app_state = AppState::borrow_from(&state).clone();
    let repo = Repository::borrow_from(&state).clone();

    let future = repo
        .run(move |connection| sql_query("SELECT 1").execute(&connection))
        .map_err(|e| e.into_handler_error())
        .then(move |result| {
            let postgres_status = match result {
                Ok(_) => Status::Pass,
                Err(_) => Status::Fail,
            };

            let health = Health {
                environment: app_state.config.env.clone(),
                postgres: postgres_status,
            };

            let response = create_response(
                &state,
                StatusCode::OK,
                mime::APPLICATION_JSON,
                serde_json::to_string(&health).expect("Failed to serialize health"),
            );

            future::ok((state, response))
        });

    Box::new(future)
}

It should be said that I am quite new to Rust, so this is probably not the best implementation. And most of my struggles could just be related to my lack of experience with futures and tokio. Having said that, these are my observations:

The code feels a lot more verbose. My previous handler implementation called a helper function check_postgres(conn: &PgConnection) -> Status to do the database query, and then simple created the response with create_response. Simple and easy to read. I have to play with this more to see how I can split the web logic (build response, return future) from the business logic (query database, return health).
I don't know how I would implement an endpoint that needs to do multiple database queries.
I am a bit concerned about all the cloning that's going on. This feels like a code smell to me. With my current knowledge, I know too little to see if/how cloning could be avoided.

On the positive side, it was pretty easy to set this up. And knowing that Diesel won't block the main thread is quite nice. 🙂

colinbankier commented 5 years ago

Thanks for the great feedback @jdno. I agree that things can be a bit more verbose - I think this is a little inevitable with an async api - as we now have the future to deal with as well as another layer of Result - i.e. one for query result, and one for the future. When async/await hits stable, this should make this easier - but of course happy for any suggestions on how to ease this in the current state. A good point to consider. Cloning is not necessarily a smell - e.g. the db connection pool in Repo uses an Arc internally, so a clone creates a threadsafe reference to it, necessary to use it from a new future / async function. I'll see if this is done more than necessary in your example. I didn't get time to look at things further as hoped last week...hopefully in the coming week (or two :) )

jdno commented 5 years ago

Cloning is not necessarily a smell - e.g. the db connection pool in Repo uses an Arc internally, so a clone creates a threadsafe reference to it, necessary to use it from a new future / async function.

This is a really good point. I saw that the examples use Arcs, but didn’t understand why until now.

I didn't get time to look at things further as hoped last week...hopefully in the coming week (or two :) )

No worries. I’m a week on vacation myself. When I’m back I’ll play a bit more with the code to see what improvements I can make to the code, and my understanding of it. 😅

colinbankier commented 5 years ago

I have opened #313 to get some more feedback and suggestions on this approach.

Thanks @jdno - based on your feedback I simplified the return type from run - to remove the nesting of a Result inside the Future - and just return the Result parts as the future's Item and Error. Hopefully this makes the api a little nicer.

I have a slightly larger app (also WIP) here: https://github.com/colinbankier/realworld-gotham that uses it so far to try it out in a more "real" example, if that is useful.

colinbankier commented 5 years ago

I don't know how I would implement an endpoint that needs to do multiple database queries.

Mostly this should be same as combining or chaining other sorts of futures. e.g. If the queries are independent, create 2 separate Futures, and then Future::join them, and then chain a then on the end to build your response using both the results. If the 2nd one depends on the first, you have to chain them with and_then or then, and finally build your response at the end of the chain. Happy to help with a concrete code example if you have one you're struggling with. Async / await will make this sort of thing much easier when it hits stable :)

gotham-rs / gotham

Async Diesel middleware and example #309