Closed eroussy closed 7 months ago
Another question to ask is about the ease of use. I understand the idea of setting the allowed CPUs during Ansible deployment and not having to wonder about that later. But actually, I find it more confusing than the "isolated" VM feature.
I think that in any case we need to have a common way to configure RT capabilities to VM. VM should be deployed on both Debian and Yocto without any change. @insatomcat @dupremathieu could you please share your opinion ?
Best,
slices/cgroups (cpuset actually), are the recommended way to do cpu isolation (isolcpus is deprecated), so I think it's nice SEAPATH is already proposing something with cpusets. The vcpupin feature of libvirt is complementary. If you really want to isolate a core and dedicate it to a vcpu for low latency purposes, I think you need both. vcpupin will ensure all the work of the vcpu will be done by a specific physical core, but slices (libvirt partitions) will make sure no other workload is going to land on this physical core. The only other way is to use isolcpus, but as mentionned before it's supposed to be deprecated.
Anyway, all those configurations are optional, so in the end I feel we can't really choose (because a seapath user may need both), but it's not really an issue because we don't have to.
I think you have missed the issue @insatomcat. All processes spawned by libvirt for the virtualization will be in the same cgroup and will share the same cpuset. Here is an example of processes spawned by libvirtd:
qemu-system-x86
qemu-system-x86
log
msgr-worker-0
msgr-worker-1
msgr-worker-2
service
io_context_pool
io_context_pool
ceph_timer
ms_dispatch
ms_local
safe_timer
safe_timer
safe_timer
safe_timer
taskfin_librbd
vhost-16369
IOmon_iothread
CPU0/KVM
kvm
kvm-nx-lpage-recovery-16369
kvm-pit/16369
We can pin some processes tweaking the libvirt XML (vcpupin, emulatorpin and iothreadpin) but we can pin it only in the cgroup cpuset and not all processes can be pinned. The unpinned processes are free to run on all CPUs inside the cpuset even pinned CPUs.
It is usually not an issue, but if you have KVM RT task pinned on all available CPUs all other none RT tasks will never be scheduled and the VM will never boot.
So to avoid this in our implementation, we have to reserve an extra CPU core only for these processes.
There are two ways to solve that, remove all cpuset and use isolcpus domaine or keep VM in the machine slice, remove vcpupin in the xml and create a qemu hook to change the slice of KVM thread to machine-rt slice and apply pinning and RT priority.
Regarding the isolcpus deprecated flags, it is just the recommended way which has been changed. I don't know if the kernel preempt RT patch modify something in this part.
@eroussy if you do not want to use cpuset just do not set it inside the Ansible inventory and add the isolcpus domain kernel parameter.
All processes spawned by libvirt for the virtualization will be in the same cgroup and will share the same cpuset.
I do not notice this on my setup. Of course I use isolcpus since this is still something done by SEAPATH. Is your setup running only the slices isolation and no isolcpus? On our example inventory, cpumachinesrt is the same as isolcpus.
I have a RT VM with 2 vcpu:
# virsh dumpxml debian | grep cpu
<vcpu placement='static'>2</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='2'/>
<vcpupin vcpu='1' cpuset='14'/>
<emulatorpin cpuset='4'/>
<vcpusched vcpus='0' scheduler='fifo' priority='2'/>
<vcpusched vcpus='1' scheduler='fifo' priority='2'/>
</cputune>
<cpu mode='custom' match='exact' check='full'>
</cpu>
And if I ignore the core for "emulation" this is what I see on the 2 dedicated cores:
# ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr | grep " 2$"
cpuhp/2 [cpuhp/2] 37 37 - TS 2
idle_inject/2 [idle_inject/2] 38 38 50 FF 2
irq_work/2 [irq_work/2] 39 39 1 FF 2
migration/2 [migration/2] 40 40 99 FF 2
rcuc/2 [rcuc/2] 41 41 10 FF 2
ktimers/2 [ktimers/2] 42 42 1 FF 2
ksoftirqd/2 [ksoftirqd/2] 43 43 - TS 2
kworker/2:0-eve [kworker/2:0-events] 44 44 - TS 2
irq/125-PCIe PM [irq/125-PCIe PME] 325 325 50 FF 2
kworker/2:1 [kworker/2:1] 334 334 - TS 2
irq/151-megasas [irq/151-megasas0-msix3] 488 488 50 FF 2
CPU 0/KVM /usr/bin/qemu-system-x86_64 4739 4775 2 FF 2
# ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr | grep " 14$"
cpuhp/14 [cpuhp/14] 160 160 - TS 14
idle_inject/14 [idle_inject/14] 161 161 50 FF 14
irq_work/14 [irq_work/14] 162 162 1 FF 14
migration/14 [migration/14] 163 163 99 FF 14
rcuc/14 [rcuc/14] 164 164 10 FF 14
ktimers/14 [ktimers/14] 165 165 1 FF 14
ksoftirqd/14 [ksoftirqd/14] 166 166 - TS 14
kworker/14:0-ev [kworker/14:0-events] 167 167 - TS 14
kworker/14:1 [kworker/14:1] 339 339 - TS 14
irq/163-megasas [irq/163-megasas0-msix15] 500 500 50 FF 14
irq/194-eno8403 [irq/194-eno8403-tx-0] 3226 3226 50 FF 14
CPU 1/KVM /usr/bin/qemu-system-x86_64 4739 4777 2 FF 14
Basically nothing besides the vcpu process and bounded kthreads... So really I don't know what those unpinned libvirt processes are about.
Basically nothing besides the vcpu process and bounded kthreads... So really I don't know what those unpinned libvirt processes are about.
Here you are only looking the processes running exactly on the two cores you choose for the RT VM. You have to look on all cores in the machine-rt.slice allowed CPU's
For example, on my setup : The machine-rt slice allowed CPUs are 4-7
root@seapath:/home/virtu# cat /etc/systemd/system/machine-rt.slice | grep AllowedCPUs
AllowedCPUs=4-7
And the processes on cores 4 to 7 (I display only one part of it) :
root@seapath:/home/virtu# ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr | grep " [4-7]$"
[...]
qemu-system-x86 /usr/bin/qemu-system-x86_64 158603 158603 - TS 4
call_rcu /usr/bin/qemu-system-x86_64 158603 158607 - TS 4
log /usr/bin/qemu-system-x86_64 158603 158608 - TS 4
msgr-worker-0 /usr/bin/qemu-system-x86_64 158603 158609 - TS 4
msgr-worker-1 /usr/bin/qemu-system-x86_64 158603 158610 - TS 4
msgr-worker-2 /usr/bin/qemu-system-x86_64 158603 158611 - TS 4
service /usr/bin/qemu-system-x86_64 158603 158615 - TS 4
io_context_pool /usr/bin/qemu-system-x86_64 158603 158616 - TS 4
io_context_pool /usr/bin/qemu-system-x86_64 158603 158617 - TS 4
ceph_timer /usr/bin/qemu-system-x86_64 158603 158618 - TS 4
ms_dispatch /usr/bin/qemu-system-x86_64 158603 158619 - TS 4
ms_local /usr/bin/qemu-system-x86_64 158603 158620 - TS 4
safe_timer /usr/bin/qemu-system-x86_64 158603 158621 - TS 4
safe_timer /usr/bin/qemu-system-x86_64 158603 158622 - TS 4
safe_timer /usr/bin/qemu-system-x86_64 158603 158623 - TS 4
safe_timer /usr/bin/qemu-system-x86_64 158603 158624 - TS 4
taskfin_librbd /usr/bin/qemu-system-x86_64 158603 158625 - TS 4
vhost-158603 /usr/bin/qemu-system-x86_64 158603 158648 - TS 4
vhost-158603 /usr/bin/qemu-system-x86_64 158603 158649 - TS 4
IO mon_iothread /usr/bin/qemu-system-x86_64 158603 158650 - TS 4
CPU 0/KVM /usr/bin/qemu-system-x86_64 158603 158651 1 FF 5
CPU 1/KVM /usr/bin/qemu-system-x86_64 158603 158652 1 FF 6
kvm [kvm] 158626 158626 - TS 4
kvm-nx-lpage-re [kvm-nx-lpage-recovery-1586 158627 158627 - TS 4
kvm-pit/158603 [kvm-pit/158603] 158654 158654 - TS 4
kworker/4:0-kdm [kworker/4:0-kdmflush/254:0 2099770 2099770 - TS 4
The question is : should all these threads run on these CPUs ? And if yes, how can we run them on other CPUs than the 4th ?
(These questions are also related to the issue #438
@insatomcat you didn't notice because we have a large cpuset range.
Reduce your machine-rt cpuset to have a number of CPUs cores equals to the number of your virtual CPUs.
I have a slice with cpuset 2-6,14-18 and a GUEST with 4 vcpus:
# cat /etc/systemd/system/machine-rt.slice
[Unit]
Description=VM rt slice
Before=slices.target
Wants=machine.slice
[Slice]
AllowedCPUs=2-6,14-18
# virsh dumpxml XXX | grep cpu
<vcpu placement='static'>4</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='5'/>
<vcpupin vcpu='1' cpuset='6'/>
<vcpupin vcpu='2' cpuset='17'/>
<vcpupin vcpu='3' cpuset='18'/>
<emulatorpin cpuset='16'/>
<vcpusched vcpus='0' scheduler='fifo' priority='5'/>
<vcpusched vcpus='1' scheduler='fifo' priority='5'/>
<vcpusched vcpus='2' scheduler='fifo' priority='5'/>
<vcpusched vcpus='3' scheduler='fifo' priority='5'/>
</cputune>
If I look at all the cores, then I see the processes you are talking about, but all on core 16, which is the emulatorpin chosen core:
# for i in 2 3 4 5 6 14 15 16 17 18; do ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr | grep " $i$"; done
cpuhp/2 [cpuhp/2] 38 38 - TS 2
idle_inject/2 [idle_inject/2] 39 39 50 FF 2
irq_work/2 [irq_work/2] 40 40 1 FF 2
migration/2 [migration/2] 41 41 99 FF 2
rcuc/2 [rcuc/2] 42 42 10 FF 2
ktimers/2 [ktimers/2] 43 43 1 FF 2
ksoftirqd/2 [ksoftirqd/2] 44 44 - TS 2
kworker/2:0-eve [kworker/2:0-events] 45 45 - TS 2
irq/125-PCIe PM [irq/125-PCIe PME] 326 326 50 FF 2
kworker/2:1 [kworker/2:1] 335 335 - TS 2
irq/134-megasas [irq/134-megasas0-msix3] 491 491 50 FF 2
irq/281-iavf-en [irq/281-iavf-ens1f1v1-TxRx 3485 3485 50 FF 2
irq/247-i40e-en [irq/247-i40e-ens1f1-TxRx-1 3810 3810 50 FF 2
irq/174-i40e-en [irq/174-i40e-ens1f0-TxRx-8 3963 3963 50 FF 2
irq/264-vfio-ms [irq/264-vfio-msix[0](0000: 7314 7314 50 FF 2
cpuhp/3 [cpuhp/3] 48 48 - TS 3
idle_inject/3 [idle_inject/3] 49 49 50 FF 3
irq_work/3 [irq_work/3] 50 50 1 FF 3
migration/3 [migration/3] 51 51 99 FF 3
rcuc/3 [rcuc/3] 52 52 10 FF 3
ktimers/3 [ktimers/3] 53 53 1 FF 3
ksoftirqd/3 [ksoftirqd/3] 54 54 - TS 3
kworker/3:0-eve [kworker/3:0-events] 55 55 - TS 3
irq/126-PCIe PM [irq/126-PCIe PME] 327 327 50 FF 3
kworker/3:1 [kworker/3:1] 336 336 - TS 3
irq/135-megasas [irq/135-megasas0-msix4] 492 492 50 FF 3
irq/282-iavf-en [irq/282-iavf-ens1f1v1-TxRx 3486 3486 50 FF 3
irq/248-i40e-en [irq/248-i40e-ens1f1-TxRx-1 3811 3811 50 FF 3
irq/175-i40e-en [irq/175-i40e-ens1f0-TxRx-9 3964 3964 50 FF 3
irq/266-vfio-ms [irq/266-vfio-msix[1](0000: 7293 7293 50 FF 3
cpuhp/4 [cpuhp/4] 58 58 - TS 4
idle_inject/4 [idle_inject/4] 59 59 50 FF 4
irq_work/4 [irq_work/4] 60 60 1 FF 4
migration/4 [migration/4] 61 61 99 FF 4
rcuc/4 [rcuc/4] 62 62 10 FF 4
ktimers/4 [ktimers/4] 63 63 1 FF 4
ksoftirqd/4 [ksoftirqd/4] 64 64 - TS 4
kworker/4:0-eve [kworker/4:0-events] 65 65 - TS 4
irq/127-PCIe PM [irq/127-PCIe PME] 328 328 50 FF 4
kworker/4:1 [kworker/4:1] 337 337 - TS 4
irq/136-megasas [irq/136-megasas0-msix5] 493 493 50 FF 4
irq/249-i40e-en [irq/249-i40e-ens1f1-TxRx-1 3812 3812 50 FF 4
irq/176-i40e-en [irq/176-i40e-ens1f0-TxRx-1 3965 3965 50 FF 4
irq/268-vfio-ms [irq/268-vfio-msix[2](0000: 7294 7294 50 FF 4
cpuhp/5 [cpuhp/5] 68 68 - TS 5
idle_inject/5 [idle_inject/5] 69 69 50 FF 5
irq_work/5 [irq_work/5] 70 70 1 FF 5
migration/5 [migration/5] 71 71 99 FF 5
rcuc/5 [rcuc/5] 72 72 10 FF 5
ktimers/5 [ktimers/5] 73 73 1 FF 5
ksoftirqd/5 [ksoftirqd/5] 74 74 - TS 5
kworker/5:0-eve [kworker/5:0-events] 75 75 - TS 5
irq/128-PCIe PM [irq/128-PCIe PME] 329 329 50 FF 5
kworker/5:1 [kworker/5:1] 338 338 - TS 5
irq/137-megasas [irq/137-megasas0-msix6] 494 494 50 FF 5
irq/250-i40e-en [irq/250-i40e-ens1f1-TxRx-2 3813 3813 50 FF 5
irq/177-i40e-en [irq/177-i40e-ens1f0-TxRx-1 3966 3966 50 FF 5
CPU 0/KVM /usr/bin/qemu-system-x86_64 5899 5954 5 FF 5
irq/270-vfio-ms [irq/270-vfio-msix[3](0000: 7295 7295 50 FF 5
cpuhp/6 [cpuhp/6] 78 78 - TS 6
idle_inject/6 [idle_inject/6] 79 79 50 FF 6
irq_work/6 [irq_work/6] 80 80 1 FF 6
migration/6 [migration/6] 81 81 99 FF 6
rcuc/6 [rcuc/6] 82 82 10 FF 6
ktimers/6 [ktimers/6] 83 83 1 FF 6
ksoftirqd/6 [ksoftirqd/6] 84 84 - TS 6
kworker/6:0-eve [kworker/6:0-events] 85 85 - TS 6
kworker/6:1-mm_ [kworker/6:1-mm_percpu_wq] 316 316 - TS 6
irq/129-PCIe PM [irq/129-PCIe PME] 330 330 50 FF 6
irq/138-megasas [irq/138-megasas0-msix7] 495 495 50 FF 6
irq/251-i40e-en [irq/251-i40e-ens1f1-TxRx-2 3814 3814 50 FF 6
irq/178-i40e-en [irq/178-i40e-ens1f0-TxRx-1 3967 3967 50 FF 6
CPU 1/KVM /usr/bin/qemu-system-x86_64 5899 5956 5 FF 6
irq/273-vfio-ms [irq/273-vfio-msix[4](0000: 7298 7298 50 FF 6
cpuhp/14 [cpuhp/14] 161 161 - TS 14
idle_inject/14 [idle_inject/14] 162 162 50 FF 14
irq_work/14 [irq_work/14] 163 163 1 FF 14
migration/14 [migration/14] 164 164 99 FF 14
rcuc/14 [rcuc/14] 165 165 10 FF 14
ktimers/14 [ktimers/14] 166 166 1 FF 14
ksoftirqd/14 [ksoftirqd/14] 167 167 - TS 14
kworker/14:0-ev [kworker/14:0-events] 168 168 - TS 14
kworker/14:1 [kworker/14:1] 339 339 - TS 14
irq/210-ahci[00 [irq/210-ahci[0000:00:17.0] 471 471 50 FF 14
irq/147-megasas [irq/147-megasas0-msix15] 503 503 50 FF 14
irq/236-i40e-en [irq/236-i40e-ens1f1-TxRx-6 3798 3798 50 FF 14
irq/186-i40e-en [irq/186-i40e-ens1f0-TxRx-2 3975 3975 50 FF 14
cpuhp/15 [cpuhp/15] 171 171 - TS 15
idle_inject/15 [idle_inject/15] 172 172 50 FF 15
irq_work/15 [irq_work/15] 173 173 1 FF 15
migration/15 [migration/15] 174 174 99 FF 15
rcuc/15 [rcuc/15] 175 175 10 FF 15
ktimers/15 [ktimers/15] 176 176 1 FF 15
ksoftirqd/15 [ksoftirqd/15] 177 177 - TS 15
kworker/15:0-ev [kworker/15:0-events] 178 178 - TS 15
kworker/15:1 [kworker/15:1] 340 340 - TS 15
irq/146-megasas [irq/146-megasas0-msix16] 504 504 50 FF 15
irq/237-i40e-en [irq/237-i40e-ens1f1-TxRx-7 3799 3799 50 FF 15
irq/187-i40e-en [irq/187-i40e-ens1f0-TxRx-2 3976 3976 50 FF 15
cpuhp/16 [cpuhp/16] 181 181 - TS 16
idle_inject/16 [idle_inject/16] 182 182 50 FF 16
irq_work/16 [irq_work/16] 183 183 1 FF 16
migration/16 [migration/16] 184 184 99 FF 16
rcuc/16 [rcuc/16] 185 185 10 FF 16
ktimers/16 [ktimers/16] 186 186 1 FF 16
ksoftirqd/16 [ksoftirqd/16] 187 187 - TS 16
kworker/16:0-ev [kworker/16:0-events] 188 188 - TS 16
kworker/16:0H-e [kworker/16:0H-events_highp 189 189 - TS 16
kworker/16:1 [kworker/16:1] 341 341 - TS 16
irq/148-megasas [irq/148-megasas0-msix17] 505 505 50 FF 16
irq/254-i40e-00 [irq/254-i40e-0000:18:00.1: 1597 1597 50 FF 16
irq/238-i40e-en [irq/238-i40e-ens1f1-TxRx-8 3800 3800 50 FF 16
irq/188-i40e-en [irq/188-i40e-ens1f0-TxRx-2 3977 3977 50 FF 16
qemu-system-x86 /usr/bin/qemu-system-x86_64 5899 5899 - TS 16
qemu-system-x86 /usr/bin/qemu-system-x86_64 5899 5907 - TS 16
log /usr/bin/qemu-system-x86_64 5899 5927 - TS 16
msgr-worker-0 /usr/bin/qemu-system-x86_64 5899 5928 - TS 16
msgr-worker-1 /usr/bin/qemu-system-x86_64 5899 5929 - TS 16
msgr-worker-2 /usr/bin/qemu-system-x86_64 5899 5930 - TS 16
service /usr/bin/qemu-system-x86_64 5899 5934 - TS 16
io_context_pool /usr/bin/qemu-system-x86_64 5899 5935 - TS 16
io_context_pool /usr/bin/qemu-system-x86_64 5899 5936 - TS 16
ceph_timer /usr/bin/qemu-system-x86_64 5899 5937 - TS 16
ms_dispatch /usr/bin/qemu-system-x86_64 5899 5938 - TS 16
ms_local /usr/bin/qemu-system-x86_64 5899 5939 - TS 16
safe_timer /usr/bin/qemu-system-x86_64 5899 5940 - TS 16
safe_timer /usr/bin/qemu-system-x86_64 5899 5941 - TS 16
safe_timer /usr/bin/qemu-system-x86_64 5899 5942 - TS 16
safe_timer /usr/bin/qemu-system-x86_64 5899 5943 - TS 16
taskfin_librbd /usr/bin/qemu-system-x86_64 5899 5944 - TS 16
vhost-5899 /usr/bin/qemu-system-x86_64 5899 5952 - TS 16
IO mon_iothread /usr/bin/qemu-system-x86_64 5899 5953 - TS 16
SPICE Worker /usr/bin/qemu-system-x86_64 5899 5982 - TS 16
vhost-5899 /usr/bin/qemu-system-x86_64 5899 6003 - TS 16
kworker/16:1H-k [kworker/16:1H-kblockd] 5906 5906 - TS 16
kvm-nx-lpage-re [kvm-nx-lpage-recovery-5899 5945 5945 - TS 16
cpuhp/17 [cpuhp/17] 191 191 - TS 17
idle_inject/17 [idle_inject/17] 192 192 50 FF 17
irq_work/17 [irq_work/17] 193 193 1 FF 17
migration/17 [migration/17] 194 194 99 FF 17
rcuc/17 [rcuc/17] 195 195 10 FF 17
ktimers/17 [ktimers/17] 196 196 1 FF 17
ksoftirqd/17 [ksoftirqd/17] 197 197 - TS 17
kworker/17:0-ev [kworker/17:0-events] 198 198 - TS 17
kworker/17:1 [kworker/17:1] 342 342 - TS 17
irq/149-megasas [irq/149-megasas0-msix18] 506 506 50 FF 17
irq/239-i40e-en [irq/239-i40e-ens1f1-TxRx-9 3801 3801 50 FF 17
irq/166-i40e-en [irq/166-i40e-ens1f0-TxRx-0 3955 3955 50 FF 17
irq/189-i40e-en [irq/189-i40e-ens1f0-TxRx-2 3978 3978 50 FF 17
CPU 2/KVM /usr/bin/qemu-system-x86_64 5899 5957 5 FF 17
cpuhp/18 [cpuhp/18] 201 201 - TS 18
idle_inject/18 [idle_inject/18] 202 202 50 FF 18
irq_work/18 [irq_work/18] 203 203 1 FF 18
migration/18 [migration/18] 204 204 99 FF 18
rcuc/18 [rcuc/18] 205 205 10 FF 18
ktimers/18 [ktimers/18] 206 206 1 FF 18
ksoftirqd/18 [ksoftirqd/18] 207 207 - TS 18
kworker/18:0-ev [kworker/18:0-events] 208 208 - TS 18
kworker/18:1 [kworker/18:1] 343 343 - TS 18
irq/151-megasas [irq/151-megasas0-msix19] 507 507 50 FF 18
irq/240-i40e-en [irq/240-i40e-ens1f1-TxRx-1 3802 3802 50 FF 18
irq/167-i40e-en [irq/167-i40e-ens1f0-TxRx-1 3956 3956 50 FF 18
CPU 3/KVM /usr/bin/qemu-system-x86_64 5899 5958 5 FF 18
What's the emulatorpin setting in your setup?
What we want for emulatorpin is to use the no-rt cpuset (0-1,7-13,19-N in your case) to avoid reserve and loose a core for it.
I don't think you can do that while at the same time asking libvirt to use the machine-rt slice for the same guest
It should be possible using qemu hook, but to me, we should just mention it in the documentation.
I suggest indicating in the documentation:
If you use RT privilege KVM threads, the minimal cpuset size has to be: total of number of vcpus + 1
--> total of number of "isolated" vcpus + 1
You don't have to isolate all the vcpu. You may even need a "non isolated" vcpu for the housekeeping inside the vm. So basically if you need N isolated cores for your realtime workload, your guest may need N+1 cores. But this "+1" can be shared with other RT guests and with the emulator.
I suggest indicating in the documentation:
- The pinned processes have to be pinned inside the cgroup cpuset.
- If you use RT privilege KVM threads, the minimal cpuset size has to be: total of number of vcpus + 1.
- If it is not what you want, do not define any cgroup cpuset and use isolcpus domain instead.
I don't think it is a good idea to propose both. We should choose one isolation method and argument why we choosed it.
Personally,
emulatorpin
cpuset has to be in the machine-rt.slice
allowed CPUs. We cannot pin him where we want.number_of_isolated_vcpu + 1
cores to the slice. I find this confusing. It also needs one more core which is difficult on machines with few cores.The only argument I see (for now) in favor of using cgroups is that it prevents from hardware processor attacks like meltdown and spectre. Do we really want to prevent these types of attacks ? Is there any other argument I'm missing ?
Hi all,
We need to close this question. I discussed with Mathieu and we conclude that :
So, regarding the work to do :
number_of_isolated_vcpu + 1
etc ...)cpusystem
, cpuuser
...) must be removed of the inventory examples and described in the inventories README as advanced feature.@insatomcat @dupremathieu what do you think of that ? Did I miss something ?
I'm ok with all that.
Great. Maybe @eroussy it worth documenting it on LFEnergy Wiki ?
The topic is now covered in this wiki page : https://wiki.lfenergy.org/display/SEAP/Scheduling+and+priorization Feel free to reopen if you have questions or remarks.
Context There is currently two ways to handle the CPUs on which a VM has access :
isolated
VM feature in the inventory : This feature will pin the KVM threads running the VM's vCPUs on the CPUs described in thecpuset
list. This feature only pins the KVM threads and not the qemu thread responsible for managing the VM.machine-rt
ormachine-nort
slice. These cgroups are configured during Ansible setup. They come with allowed CPUs defined in the variablescpumachinesrt
andcpumachinesnort
Ansible variables. Both KVM and qemu threads of the VM will execute on the allowed CPUsThese two configurations have the same purpose but not the same philosophy. They duplicate a feature and doesn't interact easily with each others. Plus, the cgroup configuration is only on Debian for now. We have to clarify the isolation feature we want on SEAPATH.
Concerns regarding cgroups I see two problems with these cgroups today :
qemu-system-x86
) are run on the allowed CPUs of the slice. We may want to run only the KVM threads here.The second point can cause a problem, for example :
machine-rt.slice
machine-rt.slice
in order to make it work.Isolation of non-RT VMs The use of the
machine-nort
cgroup allows isolating threads of non-RT VMs. Is this relevant to isolate them if we do not have special RT needs. Wouldn't it be better to let the Linux scheduler handle these VMs on the system's CPUs ?We now need to choose the isolation method we want and use it on both on Debian and Yocto versions. I leave this question open, feel free to add your remarks below.