Meta: Warnings - Githubissues

seanmonstar commented 3 years ago

The statistics of tasks shown in the console is already an awesome tool to help understand what you tasks are doing. With the info shown, it's possible to diagnose when some tasks are exhibiting a bad behavior. However, knowing what to look for is a skill in itself. Since we (the maintainers) already know some things to look for, we can codify them into warnings that the Console explicitly points out.

This proposes adding a "warnings" system that:

Builds a framework of defining warnings, in spirit to the way rustc defines compiler lints.
When a warning is detected on a task, a symbol will appear in the new "warnings" column for the task's row.
When viewing the details of that task, a list of the warnings detected and words explaining them will be shown.
Perhaps an additional view of "only warnings" found by the Console.

(Should these be "lints" instead of "warnings", since they may have different "levels"? For instance, some might be serious no matter what, while others may just be hints.)

Here's some examples of warnings that we can show with a smaller amount of effort. It would also be wonderful to eventually include more complex ones, like deadlock detection, but that will require better instrumented resources (see #39) and more developer time.

Poll time over some maximum

If the time to poll a task is over some max (10us?), flag task with a warning. The task is either hitting some blocking code (even if sporadically), or is otherwise doing too much per poll.

This maximum should probably be configurable as a CLI arg to the console.

(This is only reliable if the app is compiled in --release mode. It'd help if the subscriber could somehow detect if it's been compiled with optimizations...)

Lost Waker

If after a poll, the waker count is 0, and a wake isn't scheduled, then the task exhibits a "lost waker" problem.

It could be that the task is "part of" a bigger one, thanks to select or join, and thus masking the issue, but it's still worth showing the warning label.

Wake-to-Poll time over some maximum

This might be useful to diagnose the system in general. For instance, assuming the "poll time maximum" warning isn't being triggered, it could show a system that has too many tasks and is starting to degrade in performance.

Task Panicked

A task panicking is always exceptional, and should show a warning.

hawkw commented 3 years ago

Yeah, this is a great idea. It would be nice to have some fairly modular way of defining warnings, similar to clippy --- obviously the actual detection of warning conditions is generally complex enough that adding a warning will generally require a non-trivial amount of code, but it would be cool to be able to have stuff like toggling on and off specific warnings etc be taken care of out of the box when adding new ones.

For starters though we should probably just write some purely by hand to figure out what they end up looking like...

seanmonstar commented 3 years ago

So I originally assumed that lints/warnings would be entirely in the console (nothing in the subscriber). The reasons were:

I didn't want to do too much computing in-process of the monitored application, so having that run in the console seemed better.
It seemed like one might want to modify some lint configuration when attaching a console (or web view or whatever). Maybe when you first connect, you realize the poll-time-max warning is too aggressive, so you want to restart the console with a higher maximum.

However, I'm now leaning the other way, that it should happen in the subscriber, and the warnings should just be emitted in protobuf and simply shown in the console. This fits more with the console only showing info, not processing and inventing new info. It also means that the code to determine if tasks are triggering warnings doesn't need to be duplicated in different viewing application (like a web view).

seanmonstar commented 3 years ago

Now I'm starting to lean back the other way, that the warnings code should be entirely in the console, and not in the subscriber. That'd reduce the amount of work the instrumentation library is causing to the user's process. It also happens to be simpler to to do, since we don't need code in both the subscriber and console to handle them, and a new way to send them over the connection.

hawkw commented 3 years ago

FWIW, I'm personally strongly in favor of doing this work in the UI application. Eventually, if we want to have multiple UIs, I expect we would factor out the core data model part of the console TUI into a crate that could be a dependency of the console TUI, a GUI app, web UI, etc. The code for detecting and generating warnings could be in that crate so that it doesn't have to be reimplemented.

hawkw commented 3 years ago

Initial framework for warnings added to the console crate in #113 and #123. Right now, we can associate warnings with tasks and display them!

However, we only have a single warning that we detect: self-wake percentages over 50%. We should now be able to start designing and implementing additional warnings.

hawkw commented 3 years ago

Some additional things that would be nice to add:

some kind of short identifier for warnings (either a short string name like "poll_time_percent" and/or a number like [W0001])
users can turn on or off a list of warnings with a CLI arg (using short identifiers)
some warnings (like the self-wake percent warning) could have configurable thresholds (the percentage of self-wakes). we could consider providing a way to configure this as well...

tokio-rs / console

Meta: Warnings #56

Poll time over some maximum

Lost Waker

Wake-to-Poll time over some maximum

Task Panicked