detect blocked tasks reliably

Most system calls can block for different reasons, such as mmap, I'm not totally sure whether or not it is going to block every time. hence maybe we only need a subset of truly blocking syscalls, such as:

blocking on FDs, this includes read/write/select/poll/epoll
blocking on PIDs, the wait4 syscall
blocking on futexes, the futex syscall
blocking on signals, the rt_sigtimedwait syscall
blocking on timers

For the rest (mmap), the blocking should be considered transient, and we shouldn't reschedule it (add it to blocked queue).

We can read /proc/<pid>/status to check task <pid> status, however, this can be racy, it would be better to have a event based system, when a task switch happens, send a notification (to the tracer). this is pretty hard to implement, my thought was when a tracee had a context switch, a signal should be sent to the tracer; this can be done, however, in the tracer's signal handler, there's no way to tell where the signal is coming from: the siginfo_t have a valid _sigpoll struct, that is: only si_band and si_fd are valid. But we cannot read is_fd, because it belongs to tracee. There's a crazy idea to set the perf_event fd to tid+fd_offset, so each tracee's si_fd would be enough to tell (the tracer's signal handler) the origin of the signal. But I think that would be too much of absurdity to implement.

We can also do that by using linux trace events, such as ftrace, it can be done by enabling certain sched events using ftrace, it would be pretty hard to implement.

Another way to do that is using bcc, it allows us to install kernel probes dynamically (as kernel modules), with the downsides of:

hard to implement, at least as hard as using ftrace
requires real root privilege
has license limitations: you cannot use just BSD/MIT license, the best compromise is dual BSD/GPL license, I have no idea what that would actually implies
not language agnostic: bcc supports cpp/python/lua only, using other programming languages relies on third parity bindings (such as rust).

reverie-rs / reverie

detect blocked tasks reliably #34