Separate CPU-priority for master

hont512 commented 1 year ago

Your Feature Request

For my config reload takes like 15-60 seconds of 100%-loaded CPU-core and it will take much more in the future. Such reload process could easily make load\latency pike. This point is even more critical because: for some cases it's reasonable to make high CPU-priority for worker-processess and now this priority is shared by reload. While "reload" itself mostly has no critical need to be as-fast-as-possible and shouldn't have even default priority.

For long time I've lived on 2.0 version with nbproc-way of multithreading. It was easy to keep workers with top CPU-priority and master-process with lowerst priority, making reload a less problem.

On 2.6 version with nbthread - I see no good ways to do it anymore. Because:

New worker-process inherits niceness from master
LUA-initialization stage (where it could be possible to adjust) is processed inside master-process, before forking new one
It's still possible to fix in LUA-runtime stage, but that would take an extra check each request

I think it could be:

Just separate config-option for master and\or workers, or for reload-stage itself
Some LUA-stage that is processed once after forking (e.g. "one-run-task" for same use as "register_init")

What are you trying to do?

Separate niceness for master-process or reload task

Output of `haproxy -vv`

wlallemand commented 1 year ago

That's an interesting point, but I don't see how this is related to nbproc vs nbthread, in master-worker mode the process that is loading the configuration was always the master, this did not change.

How were you achieving the niceness change before?

hont512 commented 1 year ago

If lua.init always happened before new process was created, may be I was just mistaken about success (it looks like 2.0 took less CPU for reload and didn't motivate for investigation)

But the mentioned way was raw and simple: LUA-posix to get own PID and LUA-io extension for ability to make shell-exec calls, including renice-commands

wlallemand commented 1 year ago

There was a PoC patch for applying a nice which originated from issue #1634 but was never merged. I'm not sure it will give you the result you want but you could give it a try.

   global
        tune.priority.startup 15
        tune.priority.runtime 0

Here is a rebase of this patch on top of the current master: 0001-WIP-set-runtime-nice.patch.txt

wlallemand commented 1 year ago

I just thought about something else, since you were using 2.0 and were doing fork/exec from lua, you could be affected by the limitation on fork that we added. Are you sure that the exec that you are doing from lua are still working? a strace could help you check that. You can also try to add the insecure-fork-wanted directive to global section if it wasn't working anymore.

wtarreau commented 1 year ago

By the way if the patch above would be of any help, at least we'd have to rename the options because nobody ever knows how "priority" is ordered on any system, given that 50% consider that lower is first and 50% consider that higher is stronger. We could rename that to "niceness" or something like this (provided it's still valid English though :-)).

wlallemand commented 1 year ago

@hont512 were you able to give it a try?

hont512 commented 1 year ago

@wlallemand

About forking - don't waste your time anymore please. I've used just execs to adjust current process. And it working now successfully, but also affects master-process.
About patch. I've used it with latest 2.6. not sure about compability, but it looks good

2.1) The problem:

If we set value of tune.priority.startup > tune.priority.runtime, than all processes have same niceness (as in tune.priority.startup)
If we set "tune.priority.startup < tune.priority.runtime" - everything works as it should be. But since it is niceness - such setup have no sense )

Logs tells that the reason - not enough permissions to increase own priority for process, that was already forked from master and owned by haproxy-user .

I've tried "AmbientCapabilities=CAP_SYS_NICE" in systemd-service (Debian 11), but that doesn't help (however, according to getpcaps - permission was successfully propagated to child)

2.2) Making renice via shell_exec in LUA is working OK together with it

2.3) I've make few tries about this part from patch

if (global.tune.prio_runtime) {
                if (setpriority(PRIO_PROCESS, 0, global.tune.prio_runtime - 100) == -1)
                        ha_warning("[%s.main()] couldn't set the runtime nice value to %d: %s\n",
                                   argv[0], global.tune.prio_runtime - 100, strerror(errno));
        }

If we move it upper in code - just before line:

set_identity(argv[0]);

Than everything works perfect by itself With settings like this I always have CPU-intesive process during restart\reload with nice 19 and running workers with -19

        tune.priority.startup 19
        tune.priority.runtime -19

wtarreau commented 1 year ago

This makes sense, of course. Thanks for the feedback. If we merge this we should also mention in the doc that it's only possible to lower the nice value if sufficiently privileged and that the feature must not be used as a plain user. As long as we keep the warning, users might notice it, read the doc and figure the problem by themselves.

By the way, regarding the terminology, why wouldn't call it "renice" instead of "priority". At least, "renice" is well-known and used only in one context. "tune.renice.startup" would be totally unambiguous. Also it's particularly obvious that you can't renice down if not privileged. Thoughts ?

hont512 commented 1 year ago

By the way. It's possible to reduce CPU-problem more for some cases. If with exact same way as niceness we'll add separate "startup cpu-map"

For example, to map reload-process with one core that is used especially for secondary tasks (or just not used for workers), forcing it to share resources more and\or maybe reduce CPU-context-switches a little for workers and other significant stuff

hont512 commented 1 year ago

@wtarreau

About naming - your "renice" way looks perfect
About docs\users\privs. If there are no reasons against just moving prio_runtime in the upper place, that I've decribed above in 2.3, so there will be no need for this at all. With that fix now I have totally working version

wtarreau commented 1 year ago

That's that's what I meant, sorry for the confusion. Yes your change is necessary. But it works because you start as a service, as a root user and are then dropping the uid later. Some users closer to application than infrastructure tend instead to start unprivileged instances running under the application's account. They could be tempted to do the same but for them it wouldn't work because the UID the they start with doesn't permit to recover the runtime priority anymore. That's how doc will help.

wlallemand commented 1 year ago

@hont512 are you still running with the patch above? We didn't merge it but since it could be useful I can push it for 2.8

haproxy / haproxy