Closed kode54 closed 9 months ago
@kode54 can you check in the stdout if the nr_page_faults
counter is 0 or > 0?
Even with the custom allocator + locking all the memory I think there are still cases where the user-space scheduler can still page fault. And that's problematic, because if it faults we can have a deadlock condition (e.g., a ktherad needs to run to resolve the page fault, but it can't run because the scheduler is waiting for the kthread to resolve the page fault).
I'm thinking for example to ksm, kcompactd, remapping huge pages... maybe other cases. If that's the case, I would say that it sounds like a kernel bug, if we mlockall() all the memory in task I would assume that the kernel never umap/remap my pages, under any condition...
I just replaced my 0.1.5 package with the -git package, which has the nr_page_faults counter. I'll report back here if/when it terminates again.
Edit: It hasn't paged out yet, but instead it's regularly pegging an entire core while a Plex transcoder is running in the background.
Oh ok, that's good news! The 0.1.5 version doesn't even have the custom allocator, so it is highly susceptible to hit page faults. The page fault issue was fixed (mitigated) by 9708a80, that is not in 0.1.5.
BTW, I'm also stress testing this potential workaround (https://github.com/arighi/scx/commit/c5a69eec3b24d715f5c3300cc84473abeea8902b). With this applied the scheduler seems to survive to page faults. We still need the custom allocator (9708a80), because we want to prevent page faults from happening as much as possible in the user-space scheduler (they can still introduce global system lags), but with this one, even if they happen, at least the scheduler seems to survive.
Also, is it normal for rustland to use up to a full core of processing time most of the time it's running, or really any time anything is using the processor elsewhere?
Also, is it normal for rustland to use up to a full core of processing time most of the time it's running, or really any time anything is using the processor elsewhere?
When the system is mostly idle it should be also idle, like right now I only have the browser open, my email client, irc client and it's using 0.3-1% of cpu:
339714 root 20 0 139628 73944 4608 R 0.3 0.5 0:00.12 scx_rustland
If I start a game, a build, etc. it can go up a lot, almost using a full core. If your system is mostly idle and rustland is using a full core, then there's a bug...
But I'm planning to do some tracing and see if we can optimize things a bit, because I have the feeling that sometimes we still have unnecessary wakeups.
Edit: It hasn't paged out yet, but instead it's regularly pegging an entire core while a Plex transcoder is running in the background.
oh! and I totally missed the edit, sorry. How much cpu % is using the plex transcoder? Is it using multiple cpus?
Moreover, about the cpu usage, can you try to apply this patch and see if it makes any difference? https://github.com/arighi/scx/commit/7bf70170693c2bbd53304e3436938af39a018652
It seems to go up a lot if system load average goes up significantly, even if it's not from pure CPU load. For instance, a lot of I/O from bcachefs on 7200 RPM drives. And this also causes my desktop compositor, Wayfire, to become quite stuttery. The stuttering goes away when the I/O goes back down. It also goes away if I terminate scx_rustland and let it revert to kernel scheduling.
ok, I'll do some tests with some I/O bound workloads. My guess is that with more I/O, tasks are releasing the CPU more often, so there's more work to do for the scheduler.
Incidentally, the Plex process ends up using between 500% and 800%, possibly even 1200%, brute force decoding videos to locate their intro and end credits timings. It kicks in the moment a full season has been copied or muxed into place.
Moreover, to clarify about the cpu load, we should expect to see a higher load / cpu usage with rustland, in particular the cpu % used by scx_rustland
itself, because, unlike other schedulers, it's doing all the scheduling work in user-space. So, the scheduling time that is usually done (transparently) by the kernel, is now done in user-space and accounted to a particular task (scx_rustland
).
This should be noticed especially when lots of tasks are running or when certain tasks are acquiring and releasing the CPUs very frequently (i.e., I/O bound tasks): the scheduler has more work to do, therefore scx_rustland
uses more cpu.
We can do some optimizations for sure, avoiding unnecessary wakeups for the scheduler, but they would affect the load when the system is mostly idle, when lots of tasks are competing each other, acquiring and releasing CPUs, the scheduler needs to do a lot of work, otherwise the system becomes sluggish and unresponsive.
But that is about system load / cpu usage. If the system becomes unresponsive under certain conditions, then it's a potential scheduling problem and we should focus on that.
I can probably close this, and open a different issue regarding I/O, because that seems to affect most, if not all schedulers that I've tried.
I've tried twice running rustland with linux-cachyos 6.7.0-4, and it times out within a few minutes to an hour of running. I previously ran it in a TTY with exec, so I had no output after it died. The second time I ran it, I ran it under tmux, and captured that it stopped updating status output for 3 seconds, then updated status again, and immediately WARN output due to the 5s watchdog timing out, and was terminated.