Open seanmonstar opened 3 years ago
Yeah, this is a great idea. It would be nice to have some fairly modular way of defining warnings, similar to clippy --- obviously the actual detection of warning conditions is generally complex enough that adding a warning will generally require a non-trivial amount of code, but it would be cool to be able to have stuff like toggling on and off specific warnings etc be taken care of out of the box when adding new ones.
For starters though we should probably just write some purely by hand to figure out what they end up looking like...
So I originally assumed that lints/warnings would be entirely in the console (nothing in the subscriber). The reasons were:
However, I'm now leaning the other way, that it should happen in the subscriber, and the warnings should just be emitted in protobuf and simply shown in the console. This fits more with the console only showing info, not processing and inventing new info. It also means that the code to determine if tasks are triggering warnings doesn't need to be duplicated in different viewing application (like a web view).
Now I'm starting to lean back the other way, that the warnings code should be entirely in the console
, and not in the subscriber. That'd reduce the amount of work the instrumentation library is causing to the user's process. It also happens to be simpler to to do, since we don't need code in both the subscriber and console to handle them, and a new way to send them over the connection.
FWIW, I'm personally strongly in favor of doing this work in the UI application. Eventually, if we want to have multiple UIs, I expect we would factor out the core data model part of the console
TUI into a crate that could be a dependency of the console TUI, a GUI app, web UI, etc. The code for detecting and generating warnings could be in that crate so that it doesn't have to be reimplemented.
Initial framework for warnings added to the console
crate in #113 and #123. Right now, we can associate warnings with tasks and display them!
However, we only have a single warning that we detect: self-wake percentages over 50%. We should now be able to start designing and implementing additional warnings.
Some additional things that would be nice to add:
"poll_time_percent"
and/or a number like [W0001]
)
The statistics of tasks shown in the console is already an awesome tool to help understand what you tasks are doing. With the info shown, it's possible to diagnose when some tasks are exhibiting a bad behavior. However, knowing what to look for is a skill in itself. Since we (the maintainers) already know some things to look for, we can codify them into warnings that the Console explicitly points out.
This proposes adding a "warnings" system that:
rustc
defines compiler lints.(Should these be "lints" instead of "warnings", since they may have different "levels"? For instance, some might be serious no matter what, while others may just be hints.)
Here's some examples of warnings that we can show with a smaller amount of effort. It would also be wonderful to eventually include more complex ones, like deadlock detection, but that will require better instrumented resources (see #39) and more developer time.
Poll time over some maximum
If the time to poll a task is over some max (10us?), flag task with a warning. The task is either hitting some blocking code (even if sporadically), or is otherwise doing too much per
poll
.This maximum should probably be configurable as a CLI arg to the console.
(This is only reliable if the app is compiled in
--release
mode. It'd help if the subscriber could somehow detect if it's been compiled with optimizations...)Lost Waker
If after a poll, the waker count is 0, and a wake isn't scheduled, then the task exhibits a "lost waker" problem.
It could be that the task is "part of" a bigger one, thanks to
select
orjoin
, and thus masking the issue, but it's still worth showing the warning label.Wake-to-Poll time over some maximum
This might be useful to diagnose the system in general. For instance, assuming the "poll time maximum" warning isn't being triggered, it could show a system that has too many tasks and is starting to degrade in performance.
Task Panicked
A task panicking is always exceptional, and should show a warning.