Processor affinity is not re-entrant safe

markbeierl commented 6 months ago

The up4.bess script uses the utils.py to set processor affinity for all processes running on the system. An issue can arise if the script is run a second time, as the parent of whatever called the script has had its own affinity rules modified.

For example, given a CPU set of [0-7] cores, and I am in an interactive shell where I can execute bessctl.

On the first run, it gets the current process's affinity list: https://github.com/omec-project/upf/blob/master/conf/utils.py#L131 This would result in [0-7] cores being available.

It then uses that to carve out as many CPUs as workers needed. Given 1 worker, this means the CPU affinity set is [1-7]. It then sets that mask on every process, including itself.

On second run, when it asks for the current process's affinity list, it will receive [1-7], and will reduce that by one, leaving [2-7] as the new mask to set for every process it can.

markbeierl commented 6 months ago

I believe a better approach would be to find full count of CPUs as the initial cores value, as this will be constant.

gab-arrobo commented 6 months ago

It would be great if you can provide a patch for your proposed solution

markbeierl commented 6 months ago

Certainly, thank you.

github-actions[bot] commented 2 months ago

This issue has been stale for 120 days and will be closed in 15 days. Comment to keep it open.

sureshmarikkannu commented 2 months ago

@markbeierl , I did went through the code to understand the reason for this design. The intention behind the existing design (CPU allocation mechanism) is a follows.

UPF worker threads are designed in such a way that each worker threads should run exclusively on separate CPU's and these CPU's should not be schedule for other process/threads in the system. Otherwise the packet processing performance will be impacted significantly.

So based on the above logic the following mechanism is working as expected,

"For example, given a CPU set of [0-7] cores system,

On the first run (first UPF instance with single worker), it gets the current process's affinity list: https://github.com/omec-project/upf/blob/master/conf/utils.py#L131. This would result in [0-7] cores being available.

It then uses that to carve out as many CPUs as workers needed. Given 1 worker, this means the CPU affinity set(for non-workers) is [1-7]. It then sets that mask on every process, including itself.

This is expected, because we don't want any other process/threads (including the parent which starts the worker) to use CPU 0 (which is already allocated for UPF1-WORKER1)

On second run, when it asks for the current process's affinity list, it will receive [1-7], and will reduce that by one, leaving [2-7] as the new mask to set for every process it can.

Yes, CPU 0 was allocated for UPF1-WORKER1, so we should use remaining cores for the new UPF. So in this case UPF2-WORKER2 will be allocated with CPU 1. Now it needs to allocate CPU [2-7] for all other process/threads in the system. Leaving CPU 0 and 1 exclusively for the UPF WORKERS."

So it seems this logic looks good and should not create any issues. If required we can discuss in detail wrt this logic/design.

gab-arrobo commented 1 month ago

hi @markbeierl, any thoughts/comments about @sureshmarikkannu analysis/response?

gab-arrobo commented 1 month ago

Closing this issue because, as explain by @sureshmarikkannu, it is not issue and is part of the UPF's design.

omec-project / upf

Processor affinity is not re-entrant safe #780