Multipathd should not try to acquire realtime scheduling, just run with high priority.

sushmbha commented 8 months ago

Multipathd tries to acquire realtime scheduling priority using sched_setscheduler function. On RHEL7, when CPU accounting by systemd is enabled, it can not acquire realtime scheduling unless a realtime budget is allocated to the cgroup that multipathd runs in. This causes issues because realtime budget allocations are absolute, hence if more than one application needs to run in realtime, they need to negotiate their budgets beforehand. Consequently, on a system where multipathd is enabled and CPU accounting by systemd is on, any third party application which needs to get realtime scheduling and tries to allocate a budget for it, can potentially fail.

We think multipathd should do away with the realtime scheduling requirement and just run with high priority (negative nice). With modern schedulers, that should be adequate and we do not need realtime scheduling for multipathd.

bmarzins commented 8 months ago

I agree that multipathd doesn't need realtime scheduling. However, multipathd is currently written like it has realtime scheduling. For instance, the directio path checker uses a one microsecond timeout to wait for events. If multipathd is not a realtime process (even with a nice value of -20 on a system with spare CPU cycles), this may end up waiting for much longer. I don't think that these are likely to cause significant problems, but they will change the timing of multipathd's actions.

mwilck commented 8 months ago

I don't think the directio checker will be an issue. 1us is extremely low anyway, the checker initially sees paths as "pending", most of the time.

I guess that, in systemd environments, we'd actually be better off using systemd directives to configure this at unit startup time, e.g. CPUSchedulingPolicy= CPUSchedulingPriority=.

To begin with, we should probably stop using RR priority 99, which is the highest available prio. This choice (which has been in place forever) looks a bit over-zealous anyway. Maybe we should just set the minimum available priority (1) for the time being, before we have sorted this out for good?

sushmbha commented 8 months ago

Hi @mwilck , actually lowering the priority (still keeping it realtime), will not help with this, we still need to allocate budget to the cgroups for that to work. Do you foresee any issue if multipathd runs with normal scheduling, with a raised priority ?

mwilck commented 8 months ago

I've talked to our cgroup expert and he said: "the RHEL issue is due to kernel CONFIG_RT_GROUP_SCHED that is: a) feature with partial implementation in kernel, b) systemd officially discourages running on such kernels". So the problem you're seeing with RT scheduling is specific to RHEL. (Not sure if only RHEL7 or also later versions). I'm also told that Nice= would not have the expected effect, because it'd increase the priority only inside multipathd's own cgroup.

I think indeed that the way to go (in systemd environments) is to use unit file settings for this purpose rather than hard-coded sched_setscheduler(2) calls. In non-systemd environments, we'd fall back to hard-coded settings. I wouldn't want to make this a run-time configuration option, I think compile-time should be fine. But this needs to be implemented and tested properly, which will take some time. It's definitely not material for the upcoming 0.9.8 release.

In the meantime, you should be able to work around the issues you're seeing by using systemd's LimitRTPRIO. That should prevent multipathd from acquiring RR prio. You'll see a warning but multipathd will still work. Some testing with these settings would be appreciated.

bmarzins commented 8 months ago

FWIW, rhel-7 and rhel-8 both set CONFIG_RT_GROUP_SCHED. rhel-9 does not. But nothing sets a cpu budget for multipathd. Are you doing this manually? I thought the two options were 1. all realtime process share the same realtime cpu budget and things don't need to be explicitly set up, or 2. no realtime process can run without explicitly setting up the budget? So if you are doing this manually, can't you just stop and have multipathd fail to become a realtime process? And if you aren't doing this manually, have you really seen a situation where multipathd stopped another process from being able to set up realtime scheduling?

bmarzins commented 8 months ago

Hrm... Despite RHEL7 and RHEL8 both having CONFIG_RT_GROUP_SCHED, and both having

# systemctl show | grep CPUAccounting
DefaultCPUAccounting=no

# systemctl show multipathd.service | grep CPUAccounting
CPUAccounting=no

On RHEL-7 multipathd does set up RT scheduling and on RHEL-8 it fails (I've got a patch to fix the sched_setscheduler() condlog() error message to not use LOG_WARNING which gets converted into LOG_DEBUG). It looks like this is because on RHEL-7 multipathd is getting it budget directly from cpu,cpuacct where there's a budget for it:

# grep `pidof multipathd` /sys/fs/cgroup/cpu,cpuacct/tasks
7913
# cat /sys/fs/cgroup/cpu,cpuacct/cpu.rt_runtime_us
950000

And on RHEL-8, it's accounted as part of system.slice/multipathd.service which has no rt budget:

# grep `pidof multipathd` /sys/fs/cgroup/cpu,cpuacct/tasks
# grep `pidof multipathd` /sys/fs/cgroup/cpu,cpuacct/system.slice/multipathd.service/tasks 
795578
# cat /sys/fs/cgroup/cpu,cpuacct/system.slice/multipathd.service/cpu.rt_runtime_us 
0

bmarzins commented 8 months ago

So clearly multipath has been running as a non-realtime process in RHEL-8 without any complaints. I am find with making this configurable in systemd.

sushmbha commented 8 months ago

Hi @bmarzins @mwilck , making it configurable in systemd does not solve the problem as such. When multipathd is configured (through systemd environment variable, lets say), to run in realtime, we still need to allocate an explicit realtime budget for it, which is not possible without knowing the requirements from other realtime applications on the system..

Are you suggesting that the sched_setscheduler call in multipathd will remain as it is, but multipathd will run in realtime on a "best-effort" basis ..? i.e if a budget is allocated for it explicitly by the admin, then it will run in realtime, otherwise it will continue in normal scheduling policy ? This may be okay, however it may be a surprise to the system admin, as they may see that the process was running in realtime earlier but after CPU accounting is turned on it is no longer running in realtime. Maybe a info/debug level log is needed if the sched_setscheduler call fails, at least ?

bmarzins commented 8 months ago

@sushmbha if's it configurable in systemd, then it is easy to stop multipathd from running in real time if it is causing problems. Like I mentioned, in RHEL8 it appears that multipathd is frequently not running as a realtime process, with no complaints. I agree we need to increase logging there. I'll post a patch today to bump the error logging there to the notice level. It is currently does log at the debug level (which means that multipathd won't log it unless you bump the verbosity all the way up).

The idea is that this is a first step towards making multipathd run as a regular SCHED_OTHER process by default, once we verify that it's not causing problems.

bmarzins commented 8 months ago

Actually, thinking about this more, I'm not sure that the systemd CPUSchedulingPolicy and CPUSchedulingPriority are the right way to go. If these are set, and multipathd can't get the desired policy and priority, it will fail to start. The way things currently are, multipathd will continue to work, even if it can't set the scheduling policy or priority it wants. This goes back to my earlier question @sushmbha, if CPU accounting is on, then multipathd shouldn't be running as a realtime process unless you explicitly gave it a budget, but it should still run. If CPU accounting isn't turned on, then running multipathd as a realtime process shouldn't stop other realtime processes from running. So are you explicitly giving multipathd a CPU budget, and couldn't you just stop?

bmarzins commented 8 months ago

FWIW, it turns out that on my RHEL-8 system, another service was requesting CPU accounting, which turned it on and kept multipathd from running as a realtime process, but this was a standard install, so I assume that this is happening often. But it just underscores the problems with making multipathd fail to run if it can't get the scheduler policy it wants. Installing an unrelated piece of software could cause multipathd to stop working. Obviously, this wouldn't actually be a problem for RHEL, since these changes would only go into RHEL-9 and fedora, which don't set CONFIG_RT_GROUP_SCHED, but I'm not sure if other distributions would see problems.

sushmbha commented 8 months ago

Hi @bmarzins currently I am explicitly allocating budget for multipathd to run in realtime. However this has the problem that it can interfere with other third party applications which also require realtime budget. I can stop allocating budget to multipathd, in which case it will run with normal scheduling. As I understand, this is okay for multipathd, as we do not expect to see issues when it runs without realtime. It just seems that the design is keeping this behavior to chance, that's the reason behind filing this issue. If the realtime requirement is not hard, then removing the sched_setscheduler call will make the behavior more consistent across platforms.

mwilck commented 8 months ago

Actually, thinking about this more, I'm not sure that the systemd CPUSchedulingPolicy and CPUSchedulingPriority are the right way to go. If these are set, and multipathd can't get the desired policy and priority, it will fail to start.

IIUC this happens only on distributions that set CONFIG_RT_GROUP_SCHED, right? Or can it happen without it as well?

mwilck commented 8 months ago

If the realtime requirement is not hard, then removing the sched_setscheduler call will make the behavior more consistent across platforms.

No, multipathd doesn't have hard realtime requirements in the strict sense. It's probably sufficient just to have it run with default scheduling policy at high priority.

bmarzins commented 8 months ago

IIUC this happens only on distributions that set CONFIG_RT_GROUP_SCHED, right? Or can it happen without it as well?

Yes, and like I said, CONFIG_RT_GROUP_SCHED is not set for RHEL-9 or fedora, where these changes would land, but I'm not sure that we can say that about every distribution. If that kernel parameter is enabled, and CPU accounting is turned on in systemd, which can happen if any service requests it, then multipathd won't be able to run as a realtime service without an explicit budget. The way things currently are, multipathd will simply run as a normal SCHED_OTHER service in this case. But if we add CPUSchedulingPolicy and CPUSchedulingPriority to multipathd.service and it can't run as a realtime process, it will fail to start, and we don't get any control over the failure message.

I'm fine with the idea of making whether/how we call sched_setscheduler() a compile time setting. I just don't think we should add options to multipathd.service that mean it either runs as a realtime process or not at all, when I really can't come up with a justification for why it should be a realtime process.

mwilck commented 8 months ago

Fine with me. People who want to use systemd for this kind of thing can still add the systemd directives by themselves.

sushmbha commented 8 months ago

Hi @mwilck @bmarzins I think this is the best approach : https://github.com/opensvc/multipath-tools/issues/82#issuecomment-1971522479 .

To summarize my understanding, are you suggesting that in long term multipathd will do away with the sched_setscheduler call ? If this is not something that can be changed right now, then do you recommend just letting multipathd run in realtime scheduling wherever it can (due to either CONFIG_RT_GROU_SCHED not being enabled, or explicit realtime budget allocated to multipathd) and otherwise run with normal scheduling ?

mwilck commented 8 months ago

To summarize my understanding, are you suggesting that in long term multipathd will do away with the sched_setscheduler call ?

Yes, probably. As @bmarzins said, we should collect some more practical evidence, and perhaps do some targeted testing. On RHEL8, if I read the above correctly, multipathd won't be able to enable RT scheduling as soon as any other service has enabled CPU Accounting. So, assuming that a certain percentage of RHEL or CentOS customers use both multipath and other services running under RT policy, we already have some evidence that multipathd running with just normal priority works ok-ish.

OTOH, a distinctive property of multipathd is the fact that it runs quietly most of the time, but becomes a crucial part of the system in certain rare situations when path failovers / failbacks are happening. It's particularly important that multipath reacts timely if paths are getting back online or new paths are added. This doesn't require true real-time behavior, but it would obviously be bad if multipathd was delayed because of higher-prio processes taking all CPU time. A worst-case scenario is like this:

system thrashing because of memory overcommit or whatever
one or more multipath maps loose all healthy paths and start queueing
thus either swap-out or page writeback or both can't make progress

In this situation it's be important that, if a path gets back online or is added/rediscovered, multipathd quickly notices and activates this path in the multipath map(s) that are qeueueing. If that happens too late or not at all, depending on configuration, the map will either stop queueing (causing IO failure on the file system level) or the OOM killer will kill some crucial service, or the system will stall, or all of the above.

multipathd itself is more or less immune against thrashing because it uses mlockall() to avoid it's memory being swapped or paged out. But it could encounter priority inversion. Some higher-prio task (RT or just normal tasks with higher prio) might occupy the CPU, and this higher prio task wasn't making progress because of the thrashing situation. For example, the RT process might be busy-waiting on some pipe which would normally deliver data very quickly, but the other end of the pipe might be blocked by swap-in.

Setting multipathd to max RT priority is the only way I can think of to be certain that this situation can't occur. With max RT/RR prio, multipathd would be scheduled sooner or later, even with the most evil concurrent RT processes around.

With this in mind, it's very hard to say with confidence that we'd reached a sufficient amount of evidence to tell that running multipathd at normal prio is safe, unless we've tested really bad situations like the one described. That scenario is obviously extreme, but it's one of the scenarios that multipathd has been created for.

At the end of the day, I suppose it's the user's decision. A well-written RT process shouldn't behave like I described above, which means that multipathd running at high priority with standard scheduling should have a chance to run, reinstate paths, and save the system even in very bad situations. Also, it's not a proven fact that RT scheduling actually makes a difference in practice: our test coverage for situations like this is not as it should be, and we don't completely avoid multipathd accessing the file system.

@bmarzins, please double-check what just wrote, perhaps I'm getting something wrong here. It's not a simple matter.

If this is not something that can be changed right now, then do you recommend just letting multipathd run in realtime scheduling wherever it can (due to either CONFIG_RT_GROU_SCHED not being enabled, or explicit realtime budget allocated to multipathd) and otherwise run with normal scheduling ?

Yes, this would be the current recommendation.

mwilck commented 7 months ago

Here's a new idea: multipathd could call getrlimit(RLIMIT_RTPRIO) and raise its prio to the highest possible value. Then admins could use systemd's LimitRTPRIO= directive to determine which prio multipath may obtain. That would avoid the previously discussed issues with CPUSchedulingPolicy=, while making it possible for administrators to adjust the value to parameters that suit their environment.

bmarzins commented 7 months ago

@mwilck, I think your analysis is correct, although for what it's worth, RT processes constantly running on all the CPUs of a machine, even in an error case, seems pretty unlikely. In general having IO hang is more likely to keep things from running than cause them to run constantly. But bugs exist, especially in corner cases, so yeah it's possible.

You LimitRTPRIO idea seems fine. I believe those limits aren't binding for root processes. For instance, by default systemd sets LimitRTPRIO to 0, but that doesn't stop sched_setscheduler from setting the prio to 99. So I assume you are suggesting that multipathd just looks at the limit, and if it's 0, it does nothing. Otherwise it calls sched_setscheduler(0, SCHED_RR, prio) where prio is smaller of rlim.rlim_max and sched_get_priority_max(SCHED_RR), since people (including us for now) could be setting LimitRTPRIO=infinity in multipathd.service.

mwilck commented 7 months ago

I believe those limits aren't binding for root processes.

I wasn't aware of that, but yes, multipathd could just comply "voluntarily".

sushmbha commented 7 months ago

Hi @mwilck @bmarzins shouldn't multipathd at least run in high priority, when it is not acquiring realtime scheduling ?

mwilck commented 7 months ago

It depends ... what @bmarzins said about RHEL8 suggests that multipathd would work just fine, most of the time, if it's running at regular priority. My worst-case scenario above can't be a avoided by using a negative nice level. I'm sure there is some grey zone in which running at higher prio might help systems survive critical situations, but I can't tell if it matters in practice.

mwilck commented 6 months ago

@sushmbha , have you seen Ben's latest patch? Are you ok with this solution?

mwilck commented 6 months ago

@sushmbha: ping!

sushmbha commented 6 months ago

Hi @mwilck , the patch seems reasonable to fine tune the realtime priority of multipathd, but the problem with it is , its not dynamic. LimitRTPRIO=0 will make multipathd run as a normal process. This means that for a distro, this behaviour will be defined already and it is not a dynamic behaviour. This setting is okay to control the real-time priority or set it to non-realtime. However, the same thing is currently achieved by default, if there is no realtime budget allocated for multipathd cgroup and cpu accounting by systemd is turned on in the system. The behaviour which I was looking for is: Let multipathd try to acquire realtime scheduling if possible (for this to work, realtime budget should be allocated externally, or cpu accounting by systemd should be off.) If it can not acquire realtime, then let it run with high priority at least . i.e let it run with negative nice value of maybe -10? (less than kernel threads but higher than normal userspace processes). Does this seem reasonable ?

mwilck commented 6 months ago

@sushmbha, thanks.

the patch seems reasonable to fine tune the realtime priority of multipathd,

I'll take this as an ACK from your side.

The behaviour which I was looking for is: Let multipathd try to acquire realtime scheduling if possible [...] If it can not acquire realtime, then let it run with high priority

I am not sure which problem this dynamic behavior would solve. AFAIU, it matters only for kernels with CONFIG_RT_GROUP_SCHED, which seems to be a concept that's slowly fading away. Admins who use this together with systemd CPU accounting, like yourself, will need to apply manual fine-tuning anyway, so can't they take care of multipathd as well? I don't understand the algorithm well enough, but I am assuming that by setting LimitRTPRIO lower than the prio of other real time tasks, multipathd would at least not make those other tasks fail any more, which was the original intention of this issue. Is this assumption wrong?

This said, we could of course add code on top of Ben's patch that tries to increase the normal priority if setting RT priority fails. I am just not sure if it will be an actual improvement.

sushmbha commented 6 months ago

Hi @mwilck, The problem is we need to ship a solution which is generic enough, which will work for customers who have CPU accounting on , and also others who dont enable CPU accounting (relatively uncommon). We do not want to penalise customers who do not even enable CPU accounting, because of others who do. If we change the service file to set this setting to 0, then even when accounting is not enabled, a customer will not be able to use realtime scheduling for multipathd. That is why to keep the solution generic and also not affect the performance of multipathd, I think it is better that let it acquire RT scheduling where possible, otherwise let it run in high priority (negative nice). So I definitely think its an improvement over hardcoding the RTprio value.

mwilck commented 6 months ago

@sushmbha, thanks again.

Reading between the lines of your response, I figure that you develop a product or appliance that is based on RHEL8 or some other distro which enables CONFIG_RT_GROUP_SCHED. Out of curiosity, what is it?

I understand what you're aiming for.

I just checked on a few of my systems. It's quite obvious that none of the vital system processes uses anything close to multipathd's priority. The only processes at RT prio 99 that I observe are the kernel's migration/$N threads. Everything else, including kernel threads like idle_inject or threads that are vital for hardware functioning, uses no more than RT prio 50. Lots of kernel threads, like work queues related to file systems, block I/O, or RCU, run at regular prio with nice level -20.

Thus, as observed before, multipathd's default prio is monstrously exaggerated. If we talk about priority inversion like I did above, it's much more likely that multipathd blocks other processes than vice-versa.

However, I don't run actual RT systems. You do, apparently. In order to assess which prio level would be appropriate for multipathd, could you give us some examples about typical RT processes and the priorities they use?

Also, can you answer my previous question? If multipathd used a lower RT prio than the other RT processes in the system, could it still cause the other processes to fail to start?

mwilck commented 6 months ago

New working hypothesis (to be discussed):

a nice level of -18 should be sufficient for most cases. On the systems I've examined (openSUSE/SLE systems without realtime), it would put multipathd behind the kernel's work queue handlers, but before other high-prio processes like audit or openvswitch.
if RT is desired, a low RT prio like 10 should be sufficient most of the time. It would put multipathd before the work queue handlers, but behind idle_inject, DRM handlers (card$X-crtc$Y), and watchdog threads.

The LimitRTPrio technique could be used to select between these two.

While it's generally impossible to find a solution that fits every use case, this should fit most of the time, and would be a huge improvement about the current policy.

Comments? @hreinecke, what is your take on this?

sushmbha commented 6 months ago

Hi @mwilck , the proposal in https://github.com/opensvc/multipath-tools/issues/82#issuecomment-2051530832 looks good to me. So using the LimitRTPrio in service file, multipathd behaviour can be tweaked like below:-

If LimitRTPrio is zero, do not acquire realtime, multipathd will run with negative nice. 2, If LimitRTPrio is non-zero, try to acquire realtime scheduling with the configured priority value. If it fails, run with negative nice.

---> I think the above covers all scenarios and also offers flexibility for different systems running different set of RT processes. Is my understanding about your proposal correct ?

Regarding your question, I work on Oracle Linux distribution. Unfortunately about the RT processes running on a Oracle Linux system, I do not have an answer, it really depends on lot of factors and also the specific system (DB/cloud) etc.

bmarzins commented 6 months ago

I'm pretty sure setting nice to -18 will do nothing. At least in Redhat based distributions, I believe there are multiple things keeping that from having any effect. The first is is autogroups. For kernels configured with

CONFIG_SCHED_AUTOGROUP=y CONFIG_FAIR_GROUP_SCHED=y

nice values only effect the relative priority of processes within an autogroup. Different autogroups are scheduled based on the value in /proc//autogroup (see "The autogroup feature" and "The nice value and group scheduling" in sched(7) for details).

But I'm not sure that this matters either, since systemd puts multipathd in its own cgroup within a slice and the cgroups resources can trump the autogroups settings. CPUWeight is used to control the relative scheduling priority of different units in a slice. Assuming I understand systemd.resource-control(5) correctly, if CPUWeight is undefined then the autogroups priority is used. But that still leaves me with questions. Do the autogroups prio values work if CPUWeight is undefined for just that unit, or only if it's undefined for all the units is a slice? If some units in a slice define CPUWeight and some don't, I have no idea how those get weighted against each other. I can putz around a little and see if I can figure out how this all works, but setpriority() isn't going to do what we want.

For another reference to all this, see: https://www.reddit.com/r/Fedora/comments/t14ojh/nice_became_a_noop_again_and_how_to_work_around/

mwilck commented 6 months ago

Sorry for being ignorant, (open)SUSE doesn't use CONFIG_SCHED_AUTOGROUP. But it uses cgroups, like probably every modern Linux distro.

Back to start – I will try to summarize below what I think I've understood. Correct me if I'm wrong.

1) traditional priority management with nice level just doesn't work. 2) without RT, we could use CPUweight= to increase the priority of multipathd.service. This can only be done in the unit file. 3) According to sched(7), a difference in the nice level of 1 corresponds to a factor of ~1.25 "weight". This means that CPUweight=1000 would roughly be a nice level of -10 relative to other services in system.slice. 4) Using CPUweight implies use of the CPU controller. 5) RT priority is mutually exclusive with using the CPU controller, unless we use Slice=-.slice for placing multipathd.service into the root cgroup. 6) The details are complicated and depend on cgroups v1 vs. v2, the use of CONFIG_RT_GROUP_SCHED, and possibly other parameters.

This is so complex that I don't think a generic, "dynamic" solution as requested by @sushmbha is feasible. The unit files for the RT case and for the non-RT case will necessarily look different, even if we restrict ourselves to cgroups v2 without CONFIG_RT_GROUP_SCHED.

multipathd itself can't do more than it does with Ben's current patch – try to acquire RT prio in within the configured limits, and do nothing if this fails. Wrt the unit file, we (upstream) can only provide configuration examples and documentation for running multipathd with or without RT. Distributions will have to decide what default policy they want to ship, realizing that the configuration will probably not suit every use case.

I assume that in the long term, distributions will opt for using non-RT by default, because neither disabling the CPU controller nor running multipathd in the root cgroup are attractive options.

bmarzins commented 6 months ago

I played around with this stuff, and at least for RHEL-based distributions, CPUWeight seems to work fine for limiting process run times if there is contention. I'll look a little more to verify that CPUWeight doesn't do something bad when multipathd switches itself to RT after it has started. But assuming having CPUWeight in the multipathd.service file doesn't hurt things if multipathd becomes a real time process, then it should be possible to set both

LimitRTPrio=infinity CPUweight=1000

and have multipathd either be real time or have what amounts to a negative "nice" value. Distributions and individual users can mess with these numbers depending on what they want and how things are configured in the distro.

bmarzins commented 6 months ago

Everything looks sensible when I set CPUWeight. If LimitRTPrio=infinity is also set, multipathd becomes a real time process, and CPUWeight doesn't appear to have any effect. If LimitRTPrio=0 is also set, multipathd stays as a regular process, and CPUWeight controls how much processing time it gets if there is contention for it. I'm sending a patch to set CPUWeight=1000.

mwilck commented 6 months ago

Thanks for testing that. In the meantime I checked 5.) in my comment above (the link to the cgroups-v2 documentation, where it says "the cpu controller can only be enabled when all RT processes are in the root cgroup"). IMO this part of the kernel documentation is misleading. On SLE 15, where CONFIG_RT_GROUP_SCHED is not set, I can run multipathd with RT prio under cgroups v2, in its own non-root cgroup (as usual), and still have the cpu controller active in system.slice and -.slice. This is in line with what our cgroups expert told me.

I'm sending a patch to set CPUWeight=1000.

Ok. Please also send one to decrease the default RT priority to something sane, like 10 or 20.

sushmbha commented 6 months ago

Hi, is there any new patch available for this change ?

bmarzins commented 6 months ago

Updated patches already went into https://github.com/openSUSE/multipath-tools/tree/queue

The dm-devel posts and commits are: post: https://lore.kernel.org/dm-devel/ZhaimLF4MYCPH7NF@bmarzins-01.fast.eng.rdu2.dc.redhat.com/T/#m8a356614ac5fd8cd480f91126060ba0922d4e647 commit: https://github.com/openSUSE/multipath-tools/commit/f4578a3d7b2c59f28ab936ef17c8dd456c07ef25

post: https://lore.kernel.org/dm-devel/79596769d7ebe048cbfe95020f5c492a12bb1673.camel@suse.com/T/#mf17da76962841d25a243bb01b15e6cf37364b4ac commit: https://github.com/openSUSE/multipath-tools/commit/3457c1e31ed091ff119e0d1559746cc84635f78b

sushmbha commented 5 months ago

Hi @bmarzins , @mwilck, the solution looks good. One question that I have is, this solution will work for cgroupv2, because CPUWeight= directive is effective only with cgroupsv2. For a system using cgroupv1 (e.g RHEL7), CPUWeight is not effective in making multipathd run with increased priority similar to negative nice. So for this situation do you recommend using negative nice value or do you have any other suggestion ?

bmarzins commented 5 months ago

@sushmbha The way RHEL7 is set up by default, using a negative nice value should work fine.

opensvc / multipath-tools

Multipathd should not try to acquire realtime scheduling, just run with high priority. #82