Open John-Nagle opened 3 years ago
@John-Nagle I'm confused by your concerns.
That's a problem. This project is by async enthusiasts, who seem to think that all developers should want to use async. It's a short step from there to require all developers to use async.
For requiring all developers to use async, Rust 2.0 would have to be released, and that's not part of the Rust team's plans.
Async is really needed only for a specific class of programs - those that are both I/O bound and need to maintain a large number of network connections. Outside of that niche, you don't really need it. We already have threads, after all. Not everyone is writing a web service.
I think part of the problem with async right now is that it's targeted towards web services and nothing else. Async is great for all kinds of I/O even if it's a small amount, as shown by JavaScript.
In my case, I'm writing a viewer for a virtual world. It's talking to a GPU, talking to multiple servers, decompressing files, talking to a window, and is compute bound enough to keep 2 to 4 CPUs busy. It will have a most a dozen threads. For this class of problem, threads are essential and async has negative value.
You can always use both threads and async. The only "negative value" I can see from async is the bloat of a large runtime, otherwise it shouldn't have any negative effects. Also, the way your computer talks to the GPU is asynchronous in nature, as is any hardware communication. I feel strongly that code should be written how the hardware works (that's why I prefer systems languages), and not through some abstraction, like a blocking API. I honestly think an asynchronous model is the perfect model for your program.
Already, I've dropped the "hyper"/"reqwest" crate and switched to "ureq" because "reqwest" pulls in "tokio", and that, apparently, can no longer be turned off. I'm concerned about async contamination spreading to other crates.
"hyper"/"request" should not depend on tokio, I agree with that. Libraries should never depend on runtimes, and always leave the runtime choice to the user. I think the "async contamination" is only a problem because of libraries depending on huge runtimes designed for web services. Otherwise, you can just wrap an asynchronous api with a simple general-purpose executor (like the one in "pasts" or this), and essentially avoid async. Even though async's being used under the hood, it doesn't matter because it's abstracted away in a zero-cost manner.
I'm concerned that this project may break Rust as a systems language by over-optimizing it for the software-as-a-service case.
I don't think Rust is ever going to stop being a systems language; There are a lot of use cases for async in embedded development, so I don't think any async advancements would push Rust in that direction.
random passerby comment: I am personally pretty far from being an "async" fan, I don't use it and recently started a project with mio as the backbone. however, despite my deep seated skepticism, over the past few years I've come to trust the rust leadership more on this subject, after watching them repeatedly ditch designs that would have encroached on the "what you don't use, you don't pay for" principle. it's true that popular libraries are increasingly using async and that becomes something to deal with for us "outsiders." but I don't think it follows that non-async rust pays any big costs from it. you still have full control over your own program -- you dno't have to hand it over to someone else's runtime.
Firstly, I think this is a great issue. We should keep in mind that for some workloads, the asynchronous programming paradigm might be a bit of overkill. For instance, if I'm creating a CLI that needs to make one network call, spinning up an entire async runtime versus just opening a blocking socket provided by the operating system, is a bit of an overkill.
That being said, I think there's a bit of conflation between two things:
This repository is not an attempt to force all users of Rust to use a runtime like tokio. It's about improving the Rust asynchronous programming model for those who wish to use it. Tokio is a part of the picture, and surely an important part for some workloads, but it is not meant to be a requirement for the use of async/await nor for all workloads that doing any I/O.
The creator of hyper/reqwest is on the tokio core team so it's no surprise that it uses tokio under the hood, and while I would personally love to see parts of what are now the tokio ecosystem become more runtime independent (and we will surely have stories based on that in this repo), it's also a decision of the maintainers of hyper/request how they want to implement their library. They certainly don't have to implement a blocking API on top of tokio.
As others have pointed out, your workload actually doesn't sound like a bad fit for an async model, but nothing is or should prevent you from using OS threads as the basic concurrency primitive. It's your choice.
Second, futures and async/await by themselves are basically nothing. Futures are stack allocated by default and generally remain quite small. They don't do anything unless actively polled. So having futures and async/await as an implementation detail of a library does not necessarily mean that that library automatically is more resource intensive than it needs to be. In fact, the reason that Rust has async/await (with the poll based model) is that is the option that assumes the least.
The async model assumes absolutely nothing about a threading model. There's no reason to not manually spin up threads and mix this with the use of futures. Sure, some runtimes have particular opinions on threading, but the Rust async model itself does not.
We will certainly have stories about uses for async Rust for embedded devices with low memory and power footprints. These platforms may not even have threads! A poll based async model is essentially the only thing that can be supported by such platforms.
Hey @John-Nagle -- I think you're on to something here. I'd like to encourage you to submit a user story about this! I think having some stories that represent users who don't want to think about async is a great idea. On the other hand, if you can't come up with a good story to tell (because async isn't yet impeding on your experience), maybe we can just add some projects that doesn't use async? (e.g., describe your GPU use case). Then we can add a FAQ to the "shiny future" that says "How does this future impact the non-async projects?" so that we ensure we think about and address that question.
I will be up-front, I am thinking a lot about whether async Rust should be "the default" or not when it comes to I/O. I definitely don't think it should be the only option, but I do think that we want to be able to give people a recommended story about how to write code and to maximize interop and the value of the crates.io ecosystem. I think the story is way too confusing right now.
There are real costs to async. You highlighted some, but I'll add some more. Using async fn
and .await
implies some amount of extra complexity that will take more time to learn, even if we do a good job of sanding the edges off. You have to think about which functions really want to be async
and propagate those annotations around. You may encounter system functions or other things that don't work in an async fashion. Your binary has to carry around more of a user-space runtime than it would otherwise.
At the same time, if I am going to make a crate to implement some protocol, or to develop byte-stream adapters for compression or what have you, I need to make a choice now. Sync or async. If we have people start with sync I/O but then they quickly hit limits because the async ecosystem is much larger, that's unfortunate, and the same is true in reverse.
This is precisely why I opened #54, and @BurntSushi opened #49.
In my ideal world, we will be able to tell a convincing shiny future story about how sync-vs-async developers are both well supported and have access to a wide range of interoperable crates. (I note that Zig has an interesting approach here)
One of Rust's strengths is that, at last, after decades, we have a safe threading system for high performance code. That's a huge win on a hard problem. We finally have a good way to use all those CPUs you have today without bugs due to lack of proper locking.
Some problems really need a few CPUs working in coordination. This is standard in game development. PC AAA titles today use all the CPU power available. Usually in C++, with all the problems that implies. Everything in VR and AR needs massive CPU power, and single CPUs are not getting any faster. Rust looked like an exit strategy from the nightmare of multi-threaded C++.
Then came the push for "async" everywhere.
There are so many people now who came up from the Javascript world and know only single-thread "async". They are used to that model and want to use it for everything.
Javascript is pure single thread. (Yes, there are Javascript "web workers".) In Rust you can mix threading and "async". A mix is more complex than either pure threading or pure async, and may lead to hard to find stall bugs. See this painful real world story. That's worth a read. The crates that developer was using slowly pressured him to convert his program to all-async. He didn't really need more than one CPU's worth of compute, so that worked for him. Most programs with concurrency will probably be all thread or all async.
There lies the problem. If crucial crates start to require async, the use of multiple CPUs is slowly choked off by the difficulties of mixing the two models.
@John-Nagle Thanks for the reference to the users story! Are there more stories like that you have to share?
Also, what do you think about adding a "non-async product' of a AAA game or something like that? It'd be great if you could talk about what it needs and what potential problems you foresee.
I will say that I don't immediately see the conflict between async and utilizing cores. Most of the various runtimes offer multithreaded runtimes with sophisticated schedulers, and I've thought about extending rayon (for example) to support async. This would permit rayon to support things like arbitrary DAGs of tasks, which it can't do now.
@nikomatsakis As @John-Nagle points out in Hacker News comments, while sync APIs can be built on async APIs by blocking on the root future, the overhead of the underlying runtime (be it tokio
or async-std
) is non-negligible because the runtime needs to set up polling-based IO. Based on this, I'm thinking if we can come up with a dummy future executor and runtime compatible with the async APIs provided by tokio
, but in practice run everything in a blocking fashion using a thread pool. This will solve the problem because this dummy runtime is just a thin wrapper around the standard library IO APIs, and it can effectively "syncify" async programs.
Of course this "dummy" future executor is not for scenarios where concurrency is crucial, but it could reduce the overhead of spinning up and tearing down a epoll
/kquque
file descriptor just for a few IO operations within a synchronous function call. This may also in theory be more energy efficient because by using block IO instead of polling when the IO load is low, the process can hibernate instead of meaninglessly looping for the next event.
At the same time, if I am going to make a crate to implement some protocol, or to develop byte-stream adapters for compression or what have you, I need to make a choice now. Sync or async.
I'm a big fan of the Sans IO approach championed by the Python community where we build protocol libraries without depending on the specific IO implementation, sync or async. They're implemented mostly as pure functions over state machines which make them easy to test - ideally even randomness and time are abstracted over to support deterministic simulation.
It is then possible to provide both a sync and and async implementation, but the entire endeavor means extra work with no direct language support currently, which leads developers to favour their own use case.
Tokio's Loom project and simulation future plans can also lift a lot of the work required by manually writing Sans IO protocols and testing them. The missing piece is being able to abstract over execution (blocking or not) at compile time, but that's a can of worms which feels out of scope for Rust.
Edit: just saw #49 discussing this
Based on this, I'm thinking if we can come up with a dummy future executor and runtime compatible with the async APIs provided by
tokio
, but in practice run everything in a blocking fashion using a thread pool. This will solve the problem because this dummy runtime is just a thin wrapper around the standard library IO APIs, and it can effectively "syncify" async programs.
The tokio current-thread runtime allows you to do this and will involve the minimum amount of overhead (because there are no context switches). Blocking on a thread-pool will have a higher amount of overhead and will negatively impact performance compared to "just doing blocking IO". And blocking on a threadpool where the threads there use an IO reactor ([e]poll instance) on yet another thread will cause even more overhead.
While the first version is pretty close to "just do a blocking system call" - especially if one would cache the runtime instance - the latter one is pretty far away and a lot less efficient. However people might use it, because it's the approach that is made easy be async libraries which try to be compatible with everything by delegating operations to external threads. Depending on which environment is chosen a write to a socket could involve anything from just staying on a single thread up to hopping between 3 threads.
That demonstrates that there is a spectrum of async usages between "highly efficient" and "rather inefficient", and it's probably inverse to the complexity of usings things and compatiblity between libraries.
For the more efficient ways I definitely think that exposing the async functions in synchronous manner for people who don't care about async is OK. It allows library authors to write their code only once.
But I generally agree with @John-Nagle that not everything should be async, and that for a lot of use-cases it might provide more pain than usefulness. The tricky part is to come up with a general recommendation for when it is useful. For all the projects which want to run a HTTP client as part of a desktop program there is no gain from using async. But if the same library is used inside a server which does 100k RPS it might be important to get the efficiency that is required there. And I don't think we would want to write 2 times the code to satisfy those scenarios. A lightweight blocking wrapper around async code seems ok.
I think it'd also be worthwhile for the user story to dig a little bit into what is meant by the "overhead" of bringing in tokio/async-std/smol, etc. What is the concern, and why? Longer compile times? More dependencies? Do features help at all here — from what I understand reqwest doesn't bring in the tokio executor, only traits like AsyncRead
and AsyncWrite
and utilities like tokio::sync
, unless you explicitly opt into the blocking
feature? And if you do, how do you quantify the resulting overhead?
When I used reqwest in blocking mode, and had the standard log module enabled, I got log entries such as these:
04:25:04 [TRACE] (1) reqwest::blocking::wait: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/wait.rs:43] (ThreadId(1)) park timeout 29.998282477s
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:341] signal: Want
04:25:04 [TRACE] (2) hyper::proto::h1::conn: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.9/src/proto/h1/conn.rs:650] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle }
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:200] poll_want: taker wants!
04:25:04 [TRACE] (2) hyper::client::pool: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.9/src/client/pool.rs:320] put; add idle connection for ("http", api.gridsurvey.com)
04:25:04 [DEBUG] (2) hyper::client::pool: pooling idle connection for ("http", api.gridsurvey.com)
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:341] signal: Want
04:25:04 [TRACE] (2) hyper::proto::h1::conn: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.9/src/proto/h1/conn.rs:650] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle }
04:25:04 [TRACE] (1) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:749] closing runtime thread (ThreadId(2))
04:25:04 [TRACE] (1) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:751] signaled close for runtime thread (ThreadId(2))
04:25:04 [TRACE] (2) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:799] (ThreadId(2)) Receiver is shutdown
04:25:04 [TRACE] (2) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:804] (ThreadId(2)) end runtime::block_on
04:25:04 [TRACE] (2) mio::poll: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.22/src/poll.rs:907] deregistering handle with poller
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:330] signal: Closed
04:25:04 [TRACE] (2) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:806] (ThreadId(2)) finished
04:25:04 [TRACE] (1) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:753] closed runtime thread (ThreadId(2))
The tokio current-thread runtime allows you to do this and will involve the minimum amount of overhead (because there are no context switches). Blocking on a thread-pool will have a higher amount of overhead and will negatively impact performance compared to "just doing blocking IO". And blocking on a threadpool where the threads there use an IO reactor ([e]poll instance) on yet another thread will cause even more overhead.
Whoops I didn't meant to do epoll
/ kquque
for such a hypothetical executor / runtime... It's more like the former case - running a single-thread executor on the main thread and providing a number of "on-the-surface" async
APIs. However, these APIs will actually perform blocking IO in a separate thread pool. But I do realized that using a thread pool might not be efficient, and now I think may be a better idea is to just switch to non-blocking IO (e.g. setting O_NONBLOCK
) for file descriptors without also applying multiplexed / polling / completion-based IO to them.
However people might use it, because it's the approach that is made easy be async libraries which try to be compatible with everything by delegating operations to external threads.
That demonstrates that there is a spectrum of async usages between "highly efficient" and "rather inefficient", and it's probably inverse to the complexity of usings things and compatiblity between libraries.
For the more efficient ways I definitely think that exposing the async functions in synchronous manner for people who don't care about async is OK. It allows library authors to write their code only once.
I think the hypothetical executor / runtime I'm proposing is just for @John-Nagle 's problem right above: the simplest way for an author to support both sync and async code is to first write async
code, and then create sync wrappers which block_on
the futures created by the async API in an executor. However, as you see setting up and tearing down such executors can be costly: we need to set up a thread pool, a multiplexing / polling-based IO mechanism, register file descriptors, and just after a few IO operations we undo everything again to tear down the executor. This is an overkill when the user of the sync APIs just want to do something simple and don't care about the performance. Having such a hypothetical "lightweight" executor / runtime means that we can set up and tear down executors instantly in the sync APIs, with minimal overhead, at the cost of lower IO performance, but this is exactly what users want in this use case.
And I don't think we would want to write 2 times the code to satisfy those scenarios. A lightweight blocking wrapper around async code seems ok.
Yes, exactly the reason I'm imagining something like this.
The tokio current-thread runtime allows you to do this and will involve the minimum amount of overhead (because there are no context switches).
Is that really true? There was an article on Hacker News recently where someone benchmarked. The switching cost was better for async, unless the switch was because of an I/O completion. Then it was about the same. Actually, the big win for async was less stack space usage for tasks that don't do much, which matters when you have tens of thousands of threads, but not when you have tens or hundreds.
"Async" is just context switching in user space, after all.
The use case I have is needing higher CPU utilization across multiple CPUs while maintaining reasonably good I/O performance. The async system is designed for the special case of heavy network I/O load coupled with light CPU load. That shouldn't dominate Rust's architecture, even though there are a lot of people making web services.
I think @John-Nagle is onto something here.
Consider my case:
I've documented my experience with the issue here:
I, perhaps naively given the amount of research there is into scheduling out there, that this is a fixable problem.
The problem is not limited to my code. The overuse of CPU happens in warp, oha, and has been reported to happen in the Apollo Router too.
Limiting to 1 thread is not a workable solution and Tokio does not currently support increasing and decreasing the worker thread count dynamically.
Is it possible to improve this so that async does not have a hidden penalty that only some will notice?
For what it's worth: I originally upvoted OP, but Let futures be futures successfully convinced me that futures are about more than performance.
"Whatever they're using it for, we want all developers to love using Async Rust. " - from the manifesto of this project.
That's a problem. This project is by async enthusiasts, who seem to think that all developers should want to use async. It's a short step from there to require all developers to use async.
Async is really needed only for a specific class of programs - those that are both I/O bound and need to maintain a large number of network connections. Outside of that niche, you don't really need it. We already have threads, after all. Not everyone is writing a web service.
In my case, I'm writing a viewer for a virtual world. It's talking to a GPU, talking to multiple servers, decompressing files, talking to a window, and is compute bound enough to keep 2 to 4 CPUs busy. It will have a most a dozen threads. For this class of problem, threads are essential and async has negative value.
Already, I've dropped the "hyper"/"reqwest" crate and switched to "ureq" because "reqwest" pulls in "tokio", and that, apparently, can no longer be turned off. I'm concerned about async contamination spreading to other crates.
I'm concerned that this project may break Rust as a systems language by over-optimizing it for the software-as-a-service case.
Thanks.