gotham-rs / gotham

A flexible web framework that promotes stability, safety, security and speed.
https://gotham.rs
Other
2.24k stars 125 forks source link

State of the Diesel middleware #210

Closed ChristophWurst closed 4 years ago

ChristophWurst commented 6 years ago

Hello,

since the Diesel middleware has been integrated into this single repository, it can be found in the middleware/under_development folder. Is there anything holding back from publishing an early 0.x version on crates.io so that people can start experimenting with the middleware and report any missing features or bugs? Did anyone create a backup of the old repo's issues?

Personally, I'd like to use the middleware for a few experiments and it's cumbersome to use right now because it's not available on crates.io and the git version depends on Gotham 0.3 (master). I've manually changed to require Gotham 0.2 and it seems to work just fine.

Thanks

bradleybeddoes commented 6 years ago

That middleware was something that I spent some time on and then got sidetracked (badly) before I did the async work, it was quite the experiment at the time.

The reason it hasn't had a 0.1 is purely because it isn't async compatible yet, outside of that it works, as you've seen.

@smangelsdorf has been playing with some async ideas to assist this middleware and possibly other components (such as static file support).

We'll push a 0.1 of this up the priorities list, however for now accessing via git and not crates.io is your only option.

ChristophWurst commented 6 years ago

We'll push a 0.1 of this up the priorities list, however for now accessing via git and not crates.io is your only option.

Fine by me. I've pushed a patched branch to my fork that allows me experimenting with the middleware in other projects or now: https://github.com/ChristophWurst/gotham/tree/diesel-gotham-2.0.

FYI: I've tried integrating the diesel middleware into my thread pool middleware (based on futures-cpupool). Interestingly, the performance of simple SELECT 1 queries on an in-memory sqlite database are even worse when executing inside a thread pool than just calling the blocking diesel methods inside a Gotham handler (which seemed to have been the idea of the diesel middleware according to the readme).

smangelsdorf commented 6 years ago

The overhead of dishing work out to a thread does make performance worse in micro-benchmarks, especially in the extreme case of in-memory SQLite. You'll see different (i.e. less extreme) results if you move to a database that requires I/O to be performed. It will be crucial for performance in busy production apps that don't want to pause the event loop waiting for the database to respond.

Processing synchronous tasks in background threads is the best we can do until Diesel is async which is a far more complex discussion than "do they want to do it?"

Outstanding discussions like this one are exactly the reason the middleware is still unreleased and sitting in the under_development subdirectory. I will port my code into the Diesel middleware soon, and we can discuss what needs to be addressed before it gets released as a crate.

For what it's worth, you should be able to get around the Gotham versioning issue by pointing your Cargo.toml at an older commit or tag which matches properly with the released Gotham crate.

ChristophWurst commented 6 years ago

Thank you very much for your feedback, @smangelsdorf!

It will be crucial for performance in busy production apps that don't want to pause the event loop waiting for the database to respond.

As discussed in the Diesel repo, moving blocking Diesel calls to a thread pool doesn't gain much in that regard as the size of the thread pool will limit the number of concurrent database queries.

That's why I looked into integrating async databases into Gotham today. Postgres was the only DBMS I could find an async crate for. My PoC middleware can be found here: https://github.com/ChristophWurst/gotham-middleware-postgres. Although it's working and usable, the obvious downside is that it creates a new connection for each query and my SELECT 1 benchmark (this time on a real db) showed quite bad performance results.

But I've also found a crate for async connection pooling that build on top of the tokio_postgres crate: bb8. Again, I've created a middleware for Gotham but I couldn't get it to run (yet): https://github.com/gotham-rs/gotham/issues/165#issuecomment-381664177.

smangelsdorf commented 6 years ago

As discussed in the Diesel repo, moving blocking Diesel calls to a thread pool doesn't gain much in that regard as the size of the thread pool will limit the number of concurrent database queries.

That's also true of any other configuration with a fixed database connection pool size. To the best of my knowledge, there's no need to make the thread pool as artificially low as num_cpus because the workers spend most of their time idle and waiting for I/O and won't be able to saturate a CPU.

The advantage of processing Diesel work on a thread pool is that the tokio core is freed up to process tasks that aren't waiting on the database. There's still a performance ceiling, but it's independently adjustable by changing the thread pool size and only impacts requests that need to access the database. It's definitely not the solution I'd expect Diesel to be mandating instead of async support (which I believe was their point), but for an experimental middleware it's better than blocking the event loop.

It will be important for users of our Diesel middleware to understand these performance trade-offs and make their own decision about whether it's appropriate for the amount of load they expect to see. We'll have to make sure it's clearly documented and highlight the fact that we're exposing/integrating a synchronous database library.

That's why I looked into integrating async databases into Gotham today.

I'm taking it as a foregone conclusion that proper async database support will perform better in Gotham. It's not even a question in my mind. We're just waiting for an ergonomic and type-safe async database library. 😁

Very interesting experiments there, thanks for sharing. I'm keen to see how these look when you're at a point where you're happier with them.

ChristophWurst commented 6 years ago

That's also true of any other configuration with a fixed database connection pool size. To the best of my knowledge, there's no need to make the thread pool as artificially low as num_cpus because the workers spend most of their time idle and waiting for I/O and won't be able to saturate a CPU.

I share that point of view. Offloading to dedicated threads gives the OS opportunity to schedule execution of the event loop while IO operations block.

On a related note, there's a new tokio API for those blocking tasks that could be useful in the future: https://github.com/tokio-rs/tokio/pull/317

msrd0 commented 4 years ago

The diesel middleware has been released to crates.io, so I believe this issue can be closed.