Open estebank opened 1 month ago
How does this interact with RFC #3014? Isn't must_not_suspend
the same as diagnostic::blocking
?
Would like to see that mentioned in the RFC -- if it's similar enough, then this RFC can just subsume that one.
The must_not_suspend
lint seems tangentially related. That one is to ensure that a value doesn't go through a yield
point, while this is so that we don't call std::thread::sleep
in an async
function.
How does this interact with RFC #3014? Isn't
must_not_suspend
the same asdiagnostic::blocking
?
As Estaban said, they're kinda related. For example,tracing
's Span::enter()
returns a guard that is must_not_suspend
, but it's not actually blocking.
Yeah, I think the main distinction is that values must not suspend, but calls are blocking.
I am curious if these can be unified futher, but worst case must_not_suspend
should likely be moved to the diagnostic namespace if and before it is stabilized.
I am curious if these can be unified futher
It might be that this annotation on functions could influence the logic of the must_not_suspend
lint.
worst case
must_not_suspend
should likely be moved to the diagnostic namespace if and before it is stabilized.
I agree.
what functions besides std::thread::sleep()
can be reliably marked as #[diagnostic::blocking]
?
println!()
. Though in general the std::io can be put into non-blocking mode.to_socket_addrs()
is also mentioned, but the impl on (IpAddr, u16)
/ SocketAddr
/ &[SocketAddr]
is definitely purely non-blocking. And then there is an impl on generic &T
which may or may not be blocking depending on T
...tokio::sync::Mutex
's doc one should use std::sync::Mutex
when the critical section does not cross the .await
pointwhat functions besides
std::thread::sleep()
can be reliably marked as#[diagnostic::blocking]
?
Effectively most of std::io
and std::fs
. Even println!()
should likely be considered as a no-no from an async fn
, but marking std::io::_print
as blocking
might be a bridge too far for some. Anything that performs IO of any kind in the stdlib should not be used and instead use tokio
provided APIs.
As a semi related note: As far as I remember one of the main motivations for the specification of diagnostic attribute namespaces was to outline a set of common rules for all attributes in this namespace, so that adding a new attribute to this namespace does not need a new RFC anymore.
I'm a big fan of this proposal. Using synchronous I/O from inside async functions is a common footman. It's worth noting that the same attribute would be useful to rayon and likely other parallel execution environments, as they too use a fixed size threadpool.
On Fri, May 17, 2024, at 2:42 PM, Esteban Kuber wrote:
#[diagnostic::blocking]
is a marker attribute for functions that are considered (by their author) to be a blocking operation, and as such shouldn't be invoked from anasync
function.rustc
,clippy
and other tools can use this signal to lint against calling these functions inasync
contexts.It would allow
rustc
,clippy
and other tools to provide lints like
warning: async function
foo` can block --> $DIR/blocking-calls-in-async.rs:28:128 async fn foo() { 29 interesting(); ^^^^^^^^^^^^^ foo
is determined to block because it calls intointeresting
note: interesting
is considered to be blocking because it was explicitly marked as such --> $DIR/blocking-calls-in-async.rs:5:15 #[diagnostic::blocking] ^^^^^^^^^^^^^^^^^^^^^^^ 6 fn interesting() {} ` Rendered https://github.com/estebank/rfcs/blob/diagnostic-blocking/text/0000-diagnostic-blocking.md
You can view, comment on, or merge this pull request online at:
https://github.com/rust-lang/rfcs/pull/3639
Commit Summary
• a3d24bb https://github.com/rust-lang/rfcs/pull/3639/commits/a3d24bb0cbcebdcfa1108713a2e9cd09037b49bf Add
#[diagnostic::blocking]
attribute File Changes(1 file https://github.com/rust-lang/rfcs/pull/3639/files)
• A text/0000-diagnostic-blocking.md https://github.com/rust-lang/rfcs/pull/3639/files#diff-07c20d9f37ddd58360b6b9ed610fea949cb1ffb81093f7363f9a4dbe3c4982c3 (82) Patch Links:
• https://github.com/rust-lang/rfcs/pull/3639.patch • https://github.com/rust-lang/rfcs/pull/3639.diff
— Reply to this email directly, view it on GitHub https://github.com/rust-lang/rfcs/pull/3639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABF4ZWF5OZ2KTYKBOOV27LZCZFP7AVCNFSM6AAAAABH4TSV7OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYDGMZWHEYDGMA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
We could be more explicit with the definition of blocking.
thread::sleep
is obviously blockingtry_lock().expect("unlocked)
if they are expected not to)Note that CPU-bound operations and memory access cannot be warned using this lint.
what functions besides
std::thread::sleep()
can be reliably marked as#[diagnostic::blocking]
?Effectively most of
std::io
andstd::fs
. Evenprintln!()
should likely be considered as a no-no from anasync fn
, but markingstd::io::_print
asblocking
might be a bridge too far for some. Anything that performs IO of any kind in the stdlib should not be used and instead usetokio
provided APIs.
Using Read::read
is perfectly non-blocking if you're doing polled IO. It's even explicitly supported in tokio via AsyncFd
.
Using Read::read is perfectly non-blocking if you're doing polled IO. It's even explicitly supported in tokio via AsyncFd.
I don't think you'll run the poll loop directly inside async
context though. You wrap it inside a poll_fn() closure which the blocking lint shouldn't touch (??).
I like this idea, and I think it's valuable to add it even if it's used just for thread::sleep()
, since this is a particularly bad footgun that novice users run into when they explore how async
works.
However, there's a big gray area of what is blocking and isn't: https://blog.yoshuawuyts.com/what-is-blocking/
Maybe this diagnostic could support specifying a level of "severity", which could be configurable in clippy
?
thread::sleep
and thread.join()
are very likely to be blocking, but println!
may be too pedantic. io::Read
and io::Write
are "it depends", because they're non-blocking when used with Vec<u8>
.
File IO is definitely the category of functions that needs the most programmer education of "yes, this is actually blocking", due to the prevalence of both networked file systems and disk hardware errors leading to unbounded waits.
needs the most programmer education of "yes, this is actually blocking"
That's quite generational, depending on whether one has exp
[seek head noises]
erienced what came before SSDs.
I expect some uses of Mutex
and similar to not count as blocking, e.g. the following should not count imho:
pub struct AtomicId(Mutex<[u8; 32]>);
impl AtomicId {
pub const fn new(v: [u8; 32]) -> Self {
Self(Mutex::new(v))
}
pub fn load(&self) -> [u8; 32] {
*self.0.lock().unwrap()
}
pub fn store(&self, v: [u8; 32]) {
*self.0.lock().unwrap() = v;
}
}
Can we also put this information into rustdoc
output? That would have saved me some headache in the past from crates that poorly document whether a function blocks or returns a WouldBlock
error if there isn't anything yet.
I expect some uses of
Mutex
and similar to not count as blocking, e.g. the following should not count imho
In the way that I'm doing analysis today, that use would indeed be caught (I'm doing transitive analysis), but in my case I have a complementary annotation to mark an item as "assume non blocking" and in this proposal the only checks being made would be for direct usages, so your wrapper would fly under the radar (regardless of whether it is really blocking or not). The correct analysis for Mutex
would instead be if the value from .lock()
goes past a yield
point, but that's more complex than what's been proposed here.
The correct analysis for
Mutex
would instead be if the value from.lock()
goes past ayield
point, but that's more complex than what's been proposed here.
That's exactly what #[must_not_suspend]
is for. Maybe this proposal shouldn't touch locks then.
I'd love to have this annotation and any sort of analysis for it.
I expect some uses of
Mutex
and similar to not count as blocking,
IMO, the best approach to this problem is that Mutex::lock()
is considered blocking, and usage of it from async (or other should-not-block code) code would include an attribute (an #[allow]
or #[expect]
, perhaps) that effectively declares “I have made sure that this lock is held only momentarily” (much like an unsafe {}
block, but for blocking analysis: “you can't prove this is OK, but I tell you it is”).
Similarly, filesystem and stdio operations must in general be considered blocking, but a specific application might hold the position “I don't care about the near-zero amount of blocking that will happen due to these very occasional bits of text I write to stderr under normal circumstances”.
(All of this is out of scope of the RFC — unless it somehow leads to a conclusion that blocking is too hopelessly ambiguous to introduce the attribute at all.)
The correct analysis for
Mutex
would instead be if the value from.lock()
goes past a yield point
That would not catch all cases. If the mutex might be held by non-async code for a long time, then async code calling lock()
would block. A Mutex
is OK to lock from async code if it is only locked momentarily by all its users, regardless of whether those users are async or blocking code.
one place where blocking should be allowed is in things like genawaiter, which is a library for making generators built on top of the async/await mechanism, where it's like having an iterator that may block, which is fine. there should probably be a way to opt-out of all blocking diagnostics for an async fn/block but still produce lints if there's another async fn/block nested inside. (actually, i've wanted that kind of allow but not recursively for other lints too)
To me uncontended Mutex isn't "blocking". There are many usage patterns where the critical section is small and quick. I don't think it's feasible for Rust to know how long any mutex will take to lock. Warning about all of them may be too noisy.
one place where blocking should be allowed is in things like genawaiter, which is a library for making generators built on top of the async/await mechanism, where it's like having an iterator that may block, which is fine.
The genawaiter
crate is a hack to make up for the lack of gen
block, it shouldn't influence the decision made on this RFC.
And given that genawaiter
is used like gen!({ ... })
the expanded macro could easily be #[allow(blocking)] async { ... }
.
And given that
genawaiter
is used likegen!({ ... })
the expanded macro could easily be#[allow(blocking)] async { ... }
.
I'm saying there should be an attribute like #[allow(blocking)]
:
#[nonrecursive_allow(blocking)]
async {
blocking(); // won't warn
foo(async {
blocking(); // warns
});
}
@kornelski technically an uncontended mutex should be .try_lock().unwrap()
instead of .lock()
. If you expect try_lock
to fail, it means the mutex is not uncontended. It might be insignificantly blocking, but it is not uncontended.
Nah, try_lock
introduces unnecessary unreliability. I don't mean literally never blocked, but rather not having many threads trying to lock it, and having critical section quick enough, so that threads won't wait long for the lock.
In practice, blocking is only an issue when poll
takes longer than a maximum latency an application will tolerate.
This is why there's a big gray area in what counts as blocking, because many synchronous operations could finish quickly enough, depending on how they're used, and what latency is allowed (e.g. in a server for a realtime multiplayer game 1ms may be long enough to be blocking, but a media file server could have 100ms hiccups without causing problems).
To add some further fuel: I am very much against Mutex::lock being 'blocking'. A mutex with a very short critical section will always be faster than an async lock like tokio's because tokio uses a std::sync::Mutex for the wait list: https://docs.rs/tokio/1.37.0/src/tokio/sync/batch_semaphore.rs.html#36
If tokio considers a mutex to be fast enough, then so should we. It would be very annoying to have to annotate the many such uses in my codebases.
That being said, I do respect that many use cases of mutex in async code has the potential to be slow, and I am supportive of static analysis and lints to try and improve the status quo. However, even a hashmap realloc I have measured to be several ms. An async lock won't fix that if you do tokio::sync::Mutex<HashMap<K, V>>
for your concurrent map.
Onto a slightly related but different topic, I'd like to see some investment into tools for debugging actual blocking in code. For instance, a watcher thread that monitors runtime threads, periodically polling them and sampling stack traces. If it detects that a thread has taken a while to yield, it can dump the stacktrace so you can debug uses of loop {}
without any awaits.
what about we split them to different types of blocking functions (io
, net
, mutex
) and just expose them as separate (maybe somehow parameterized) lint's?
Those categories do not carve reality along its joints. File IO can be faster or slower than network IO. Mutexes depends on the critical section, contention, potential for priority inversion, ...
If you consider swap even a ptr::read
can be blocking.
If you want to know if library calls are fast and compose well, then you need them to be
Then you can add up all the constant latencies and see if they are below your responsiveness goal for a feature of your application.
If things are not constant-time then you can quickly get O(n²) behavior due to loops. If you do not know the expected latency of calls then all that nice constant-timeness is useless if the constants are huge. If they aren't wait-free then you run into starvation situations based on system state.
Making low-latency code legible to the compiler is hard. Perhaps there's some prior art in hard realtime programming that we can look at?
For me personally, I'd prefer if mutexes were listed as blocking. If a mutex guards a short enough critical section, so then (using the lint reasons feature that's about to become stable) there will be a place where I can document that assumption in the code:
async fn something_with_mutex() {
static SHARED_MUTEX: Mutex<_> = ..;
#[allow(
blocking_in_async,
reason = "This mutex only guards `do_something` calls, which finish quickly enough."
)]
SHARED_MUTEX.lock().do_something();
}
To me, a good part of the Rust language is the many ways in which it forces me to do my homework and show that what I'm doing is correct, and this feels like another way to do that.
Perhaps there's some prior art in hard realtime programming that we can look at?
i feel like once your latency requirements become strict enough the only solution is nonlocal control flow (signals, interrupts, or preemptive multitasking)
and to add some fuel to the "what does blocking even mean" fire, is std::thread::sleep(Duration::from_nanos(1))
a blocking operation?
i think trying to estimate performance using static analysis is outside the scope of the compiler, and would be better suited with dynamic analysis (measuring time between waits)
is there even a way to manually yield an async fn when doing long cpu bound processing like keygen? it seems like you would want to await a future that returns "Pending" exactly once.
is there even a way to manually yield an async fn when doing long cpu bound processing like keygen? it seems like you would want to await a future that returns \"Pending\" exactly once.
Yes. Tokio offers a yield_now function which is very easy to remake. It calls wake immediately and returns pending once. I have previously used that during password hashing, for instance: https://github.com/neondatabase/neon/blob/a5ecca976ec3abf97c8c3db4f78c230b4553b509/proxy/src/scram/exchange.rs#L98
Alternatively tokio has a concept of a "cooperative budget", which is not yet stable api https://github.com/tokio-rs/tokio/pull/6622, but it will only return pending if the budget is fully consumed. Budget is acquired by most tokio functions like reading data from a TcpStream or sending into a channel (both of these otherwise could always be immediately ready in a loop and never yield)
#[diagnostic::blocking]
is a marker attribute for functions that are considered (by their author) to be a blocking operation, and as such shouldn't be invoked from anasync
function.rustc
,clippy
and other tools can use this signal to lint against calling these functions inasync
contexts.It would allow
rustc
,clippy
and other tools to provide lints likeRendered