seapath / ansible

This repo contains all the ansible playbooks used to deploy or manage a cluster, as well as inventories examples
https://lfenergy.org/projects/seapath/
Apache License 2.0
6 stars 16 forks source link

Questioning our use of cgroups #439

Closed eroussy closed 7 months ago

eroussy commented 7 months ago

Context There is currently two ways to handle the CPUs on which a VM has access :

These two configurations have the same purpose but not the same philosophy. They duplicate a feature and doesn't interact easily with each others. Plus, the cgroup configuration is only on Debian for now. We have to clarify the isolation feature we want on SEAPATH.

Concerns regarding cgroups I see two problems with these cgroups today :

The second point can cause a problem, for example :

Isolation of non-RT VMs The use of the machine-nort cgroup allows isolating threads of non-RT VMs. Is this relevant to isolate them if we do not have special RT needs. Wouldn't it be better to let the Linux scheduler handle these VMs on the system's CPUs ?

We now need to choose the isolation method we want and use it on both on Debian and Yocto versions. I leave this question open, feel free to add your remarks below.

eroussy commented 7 months ago

Another question to ask is about the ease of use. I understand the idea of setting the allowed CPUs during Ansible deployment and not having to wonder about that later. But actually, I find it more confusing than the "isolated" VM feature.

ebail commented 7 months ago

I think that in any case we need to have a common way to configure RT capabilities to VM. VM should be deployed on both Debian and Yocto without any change. @insatomcat @dupremathieu could you please share your opinion ?

Best,

insatomcat commented 7 months ago

slices/cgroups (cpuset actually), are the recommended way to do cpu isolation (isolcpus is deprecated), so I think it's nice SEAPATH is already proposing something with cpusets. The vcpupin feature of libvirt is complementary. If you really want to isolate a core and dedicate it to a vcpu for low latency purposes, I think you need both. vcpupin will ensure all the work of the vcpu will be done by a specific physical core, but slices (libvirt partitions) will make sure no other workload is going to land on this physical core. The only other way is to use isolcpus, but as mentionned before it's supposed to be deprecated.

Anyway, all those configurations are optional, so in the end I feel we can't really choose (because a seapath user may need both), but it's not really an issue because we don't have to.

dupremathieu commented 7 months ago

I think you have missed the issue @insatomcat. All processes spawned by libvirt for the virtualization will be in the same cgroup and will share the same cpuset. Here is an example of processes spawned by libvirtd:

qemu-system-x86
qemu-system-x86
log
msgr-worker-0
msgr-worker-1
msgr-worker-2
service
io_context_pool
io_context_pool
ceph_timer
ms_dispatch
ms_local
safe_timer
safe_timer
safe_timer
safe_timer
taskfin_librbd
vhost-16369
IOmon_iothread
CPU0/KVM
kvm
kvm-nx-lpage-recovery-16369
kvm-pit/16369

We can pin some processes tweaking the libvirt XML (vcpupin, emulatorpin and iothreadpin) but we can pin it only in the cgroup cpuset and not all processes can be pinned. The unpinned processes are free to run on all CPUs inside the cpuset even pinned CPUs.

It is usually not an issue, but if you have KVM RT task pinned on all available CPUs all other none RT tasks will never be scheduled and the VM will never boot.

So to avoid this in our implementation, we have to reserve an extra CPU core only for these processes.

There are two ways to solve that, remove all cpuset and use isolcpus domaine or keep VM in the machine slice, remove vcpupin in the xml and create a qemu hook to change the slice of KVM thread to machine-rt slice and apply pinning and RT priority.

Regarding the isolcpus deprecated flags, it is just the recommended way which has been changed. I don't know if the kernel preempt RT patch modify something in this part.

@eroussy if you do not want to use cpuset just do not set it inside the Ansible inventory and add the isolcpus domain kernel parameter.

insatomcat commented 7 months ago

All processes spawned by libvirt for the virtualization will be in the same cgroup and will share the same cpuset.

I do not notice this on my setup. Of course I use isolcpus since this is still something done by SEAPATH. Is your setup running only the slices isolation and no isolcpus? On our example inventory, cpumachinesrt is the same as isolcpus.

I have a RT VM with 2 vcpu:

# virsh dumpxml debian | grep cpu
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='14'/>
    <emulatorpin cpuset='4'/>
    <vcpusched vcpus='0' scheduler='fifo' priority='2'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='2'/>
  </cputune>
  <cpu mode='custom' match='exact' check='full'>
  </cpu>

And if I ignore the core for "emulation" this is what I see on the 2 dedicated cores:

# ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr |  grep " 2$"
cpuhp/2         [cpuhp/2]                        37      37      - TS    2
idle_inject/2   [idle_inject/2]                  38      38     50 FF    2
irq_work/2      [irq_work/2]                     39      39      1 FF    2
migration/2     [migration/2]                    40      40     99 FF    2
rcuc/2          [rcuc/2]                         41      41     10 FF    2
ktimers/2       [ktimers/2]                      42      42      1 FF    2
ksoftirqd/2     [ksoftirqd/2]                    43      43      - TS    2
kworker/2:0-eve [kworker/2:0-events]             44      44      - TS    2
irq/125-PCIe PM [irq/125-PCIe PME]              325     325     50 FF    2
kworker/2:1     [kworker/2:1]                   334     334      - TS    2
irq/151-megasas [irq/151-megasas0-msix3]        488     488     50 FF    2
CPU 0/KVM       /usr/bin/qemu-system-x86_64    4739    4775      2 FF    2

# ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr |  grep " 14$"
cpuhp/14        [cpuhp/14]                      160     160      - TS   14
idle_inject/14  [idle_inject/14]                161     161     50 FF   14
irq_work/14     [irq_work/14]                   162     162      1 FF   14
migration/14    [migration/14]                  163     163     99 FF   14
rcuc/14         [rcuc/14]                       164     164     10 FF   14
ktimers/14      [ktimers/14]                    165     165      1 FF   14
ksoftirqd/14    [ksoftirqd/14]                  166     166      - TS   14
kworker/14:0-ev [kworker/14:0-events]           167     167      - TS   14
kworker/14:1    [kworker/14:1]                  339     339      - TS   14
irq/163-megasas [irq/163-megasas0-msix15]       500     500     50 FF   14
irq/194-eno8403 [irq/194-eno8403-tx-0]         3226    3226     50 FF   14
CPU 1/KVM       /usr/bin/qemu-system-x86_64    4739    4777      2 FF   14

Basically nothing besides the vcpu process and bounded kthreads... So really I don't know what those unpinned libvirt processes are about.

eroussy commented 7 months ago

Basically nothing besides the vcpu process and bounded kthreads... So really I don't know what those unpinned libvirt processes are about.

Here you are only looking the processes running exactly on the two cores you choose for the RT VM. You have to look on all cores in the machine-rt.slice allowed CPU's

For example, on my setup : The machine-rt slice allowed CPUs are 4-7

root@seapath:/home/virtu# cat /etc/systemd/system/machine-rt.slice | grep AllowedCPUs
AllowedCPUs=4-7

And the processes on cores 4 to 7 (I display only one part of it) :

root@seapath:/home/virtu# ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr |  grep " [4-7]$"
[...]
qemu-system-x86 /usr/bin/qemu-system-x86_64  158603  158603      - TS    4
call_rcu        /usr/bin/qemu-system-x86_64  158603  158607      - TS    4
log             /usr/bin/qemu-system-x86_64  158603  158608      - TS    4
msgr-worker-0   /usr/bin/qemu-system-x86_64  158603  158609      - TS    4
msgr-worker-1   /usr/bin/qemu-system-x86_64  158603  158610      - TS    4
msgr-worker-2   /usr/bin/qemu-system-x86_64  158603  158611      - TS    4
service         /usr/bin/qemu-system-x86_64  158603  158615      - TS    4
io_context_pool /usr/bin/qemu-system-x86_64  158603  158616      - TS    4
io_context_pool /usr/bin/qemu-system-x86_64  158603  158617      - TS    4
ceph_timer      /usr/bin/qemu-system-x86_64  158603  158618      - TS    4
ms_dispatch     /usr/bin/qemu-system-x86_64  158603  158619      - TS    4
ms_local        /usr/bin/qemu-system-x86_64  158603  158620      - TS    4
safe_timer      /usr/bin/qemu-system-x86_64  158603  158621      - TS    4
safe_timer      /usr/bin/qemu-system-x86_64  158603  158622      - TS    4
safe_timer      /usr/bin/qemu-system-x86_64  158603  158623      - TS    4
safe_timer      /usr/bin/qemu-system-x86_64  158603  158624      - TS    4
taskfin_librbd  /usr/bin/qemu-system-x86_64  158603  158625      - TS    4
vhost-158603    /usr/bin/qemu-system-x86_64  158603  158648      - TS    4
vhost-158603    /usr/bin/qemu-system-x86_64  158603  158649      - TS    4
IO mon_iothread /usr/bin/qemu-system-x86_64  158603  158650      - TS    4
CPU 0/KVM       /usr/bin/qemu-system-x86_64  158603  158651      1 FF    5
CPU 1/KVM       /usr/bin/qemu-system-x86_64  158603  158652      1 FF    6
kvm             [kvm]                        158626  158626      - TS    4
kvm-nx-lpage-re [kvm-nx-lpage-recovery-1586  158627  158627      - TS    4
kvm-pit/158603  [kvm-pit/158603]             158654  158654      - TS    4
kworker/4:0-kdm [kworker/4:0-kdmflush/254:0 2099770 2099770      - TS    4

The question is : should all these threads run on these CPUs ? And if yes, how can we run them on other CPUs than the 4th ?

(These questions are also related to the issue #438

dupremathieu commented 7 months ago

@insatomcat you didn't notice because we have a large cpuset range.

Reduce your machine-rt cpuset to have a number of CPUs cores equals to the number of your virtual CPUs.

insatomcat commented 7 months ago

I have a slice with cpuset 2-6,14-18 and a GUEST with 4 vcpus:

# cat /etc/systemd/system/machine-rt.slice
[Unit]
Description=VM rt slice
Before=slices.target
Wants=machine.slice

[Slice]
AllowedCPUs=2-6,14-18

# virsh dumpxml XXX | grep cpu
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='5'/>
    <vcpupin vcpu='1' cpuset='6'/>
    <vcpupin vcpu='2' cpuset='17'/>
    <vcpupin vcpu='3' cpuset='18'/>
    <emulatorpin cpuset='16'/>
    <vcpusched vcpus='0' scheduler='fifo' priority='5'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='5'/>
    <vcpusched vcpus='2' scheduler='fifo' priority='5'/>
    <vcpusched vcpus='3' scheduler='fifo' priority='5'/>
  </cputune>

If I look at all the cores, then I see the processes you are talking about, but all on core 16, which is the emulatorpin chosen core:

# for i in 2 3 4 5 6 14 15 16 17 18; do ps -eT -o comm,cmd,pid,tid,rtprio,policy,psr | grep " $i$"; done
cpuhp/2         [cpuhp/2]                        38      38      - TS    2
idle_inject/2   [idle_inject/2]                  39      39     50 FF    2
irq_work/2      [irq_work/2]                     40      40      1 FF    2
migration/2     [migration/2]                    41      41     99 FF    2
rcuc/2          [rcuc/2]                         42      42     10 FF    2
ktimers/2       [ktimers/2]                      43      43      1 FF    2
ksoftirqd/2     [ksoftirqd/2]                    44      44      - TS    2
kworker/2:0-eve [kworker/2:0-events]             45      45      - TS    2
irq/125-PCIe PM [irq/125-PCIe PME]              326     326     50 FF    2
kworker/2:1     [kworker/2:1]                   335     335      - TS    2
irq/134-megasas [irq/134-megasas0-msix3]        491     491     50 FF    2
irq/281-iavf-en [irq/281-iavf-ens1f1v1-TxRx    3485    3485     50 FF    2
irq/247-i40e-en [irq/247-i40e-ens1f1-TxRx-1    3810    3810     50 FF    2
irq/174-i40e-en [irq/174-i40e-ens1f0-TxRx-8    3963    3963     50 FF    2
irq/264-vfio-ms [irq/264-vfio-msix[0](0000:    7314    7314     50 FF    2
cpuhp/3         [cpuhp/3]                        48      48      - TS    3
idle_inject/3   [idle_inject/3]                  49      49     50 FF    3
irq_work/3      [irq_work/3]                     50      50      1 FF    3
migration/3     [migration/3]                    51      51     99 FF    3
rcuc/3          [rcuc/3]                         52      52     10 FF    3
ktimers/3       [ktimers/3]                      53      53      1 FF    3
ksoftirqd/3     [ksoftirqd/3]                    54      54      - TS    3
kworker/3:0-eve [kworker/3:0-events]             55      55      - TS    3
irq/126-PCIe PM [irq/126-PCIe PME]              327     327     50 FF    3
kworker/3:1     [kworker/3:1]                   336     336      - TS    3
irq/135-megasas [irq/135-megasas0-msix4]        492     492     50 FF    3
irq/282-iavf-en [irq/282-iavf-ens1f1v1-TxRx    3486    3486     50 FF    3
irq/248-i40e-en [irq/248-i40e-ens1f1-TxRx-1    3811    3811     50 FF    3
irq/175-i40e-en [irq/175-i40e-ens1f0-TxRx-9    3964    3964     50 FF    3
irq/266-vfio-ms [irq/266-vfio-msix[1](0000:    7293    7293     50 FF    3
cpuhp/4         [cpuhp/4]                        58      58      - TS    4
idle_inject/4   [idle_inject/4]                  59      59     50 FF    4
irq_work/4      [irq_work/4]                     60      60      1 FF    4
migration/4     [migration/4]                    61      61     99 FF    4
rcuc/4          [rcuc/4]                         62      62     10 FF    4
ktimers/4       [ktimers/4]                      63      63      1 FF    4
ksoftirqd/4     [ksoftirqd/4]                    64      64      - TS    4
kworker/4:0-eve [kworker/4:0-events]             65      65      - TS    4
irq/127-PCIe PM [irq/127-PCIe PME]              328     328     50 FF    4
kworker/4:1     [kworker/4:1]                   337     337      - TS    4
irq/136-megasas [irq/136-megasas0-msix5]        493     493     50 FF    4
irq/249-i40e-en [irq/249-i40e-ens1f1-TxRx-1    3812    3812     50 FF    4
irq/176-i40e-en [irq/176-i40e-ens1f0-TxRx-1    3965    3965     50 FF    4
irq/268-vfio-ms [irq/268-vfio-msix[2](0000:    7294    7294     50 FF    4
cpuhp/5         [cpuhp/5]                        68      68      - TS    5
idle_inject/5   [idle_inject/5]                  69      69     50 FF    5
irq_work/5      [irq_work/5]                     70      70      1 FF    5
migration/5     [migration/5]                    71      71     99 FF    5
rcuc/5          [rcuc/5]                         72      72     10 FF    5
ktimers/5       [ktimers/5]                      73      73      1 FF    5
ksoftirqd/5     [ksoftirqd/5]                    74      74      - TS    5
kworker/5:0-eve [kworker/5:0-events]             75      75      - TS    5
irq/128-PCIe PM [irq/128-PCIe PME]              329     329     50 FF    5
kworker/5:1     [kworker/5:1]                   338     338      - TS    5
irq/137-megasas [irq/137-megasas0-msix6]        494     494     50 FF    5
irq/250-i40e-en [irq/250-i40e-ens1f1-TxRx-2    3813    3813     50 FF    5
irq/177-i40e-en [irq/177-i40e-ens1f0-TxRx-1    3966    3966     50 FF    5
CPU 0/KVM       /usr/bin/qemu-system-x86_64    5899    5954      5 FF    5
irq/270-vfio-ms [irq/270-vfio-msix[3](0000:    7295    7295     50 FF    5
cpuhp/6         [cpuhp/6]                        78      78      - TS    6
idle_inject/6   [idle_inject/6]                  79      79     50 FF    6
irq_work/6      [irq_work/6]                     80      80      1 FF    6
migration/6     [migration/6]                    81      81     99 FF    6
rcuc/6          [rcuc/6]                         82      82     10 FF    6
ktimers/6       [ktimers/6]                      83      83      1 FF    6
ksoftirqd/6     [ksoftirqd/6]                    84      84      - TS    6
kworker/6:0-eve [kworker/6:0-events]             85      85      - TS    6
kworker/6:1-mm_ [kworker/6:1-mm_percpu_wq]      316     316      - TS    6
irq/129-PCIe PM [irq/129-PCIe PME]              330     330     50 FF    6
irq/138-megasas [irq/138-megasas0-msix7]        495     495     50 FF    6
irq/251-i40e-en [irq/251-i40e-ens1f1-TxRx-2    3814    3814     50 FF    6
irq/178-i40e-en [irq/178-i40e-ens1f0-TxRx-1    3967    3967     50 FF    6
CPU 1/KVM       /usr/bin/qemu-system-x86_64    5899    5956      5 FF    6
irq/273-vfio-ms [irq/273-vfio-msix[4](0000:    7298    7298     50 FF    6
cpuhp/14        [cpuhp/14]                      161     161      - TS   14
idle_inject/14  [idle_inject/14]                162     162     50 FF   14
irq_work/14     [irq_work/14]                   163     163      1 FF   14
migration/14    [migration/14]                  164     164     99 FF   14
rcuc/14         [rcuc/14]                       165     165     10 FF   14
ktimers/14      [ktimers/14]                    166     166      1 FF   14
ksoftirqd/14    [ksoftirqd/14]                  167     167      - TS   14
kworker/14:0-ev [kworker/14:0-events]           168     168      - TS   14
kworker/14:1    [kworker/14:1]                  339     339      - TS   14
irq/210-ahci[00 [irq/210-ahci[0000:00:17.0]     471     471     50 FF   14
irq/147-megasas [irq/147-megasas0-msix15]       503     503     50 FF   14
irq/236-i40e-en [irq/236-i40e-ens1f1-TxRx-6    3798    3798     50 FF   14
irq/186-i40e-en [irq/186-i40e-ens1f0-TxRx-2    3975    3975     50 FF   14
cpuhp/15        [cpuhp/15]                      171     171      - TS   15
idle_inject/15  [idle_inject/15]                172     172     50 FF   15
irq_work/15     [irq_work/15]                   173     173      1 FF   15
migration/15    [migration/15]                  174     174     99 FF   15
rcuc/15         [rcuc/15]                       175     175     10 FF   15
ktimers/15      [ktimers/15]                    176     176      1 FF   15
ksoftirqd/15    [ksoftirqd/15]                  177     177      - TS   15
kworker/15:0-ev [kworker/15:0-events]           178     178      - TS   15
kworker/15:1    [kworker/15:1]                  340     340      - TS   15
irq/146-megasas [irq/146-megasas0-msix16]       504     504     50 FF   15
irq/237-i40e-en [irq/237-i40e-ens1f1-TxRx-7    3799    3799     50 FF   15
irq/187-i40e-en [irq/187-i40e-ens1f0-TxRx-2    3976    3976     50 FF   15
cpuhp/16        [cpuhp/16]                      181     181      - TS   16
idle_inject/16  [idle_inject/16]                182     182     50 FF   16
irq_work/16     [irq_work/16]                   183     183      1 FF   16
migration/16    [migration/16]                  184     184     99 FF   16
rcuc/16         [rcuc/16]                       185     185     10 FF   16
ktimers/16      [ktimers/16]                    186     186      1 FF   16
ksoftirqd/16    [ksoftirqd/16]                  187     187      - TS   16
kworker/16:0-ev [kworker/16:0-events]           188     188      - TS   16
kworker/16:0H-e [kworker/16:0H-events_highp     189     189      - TS   16
kworker/16:1    [kworker/16:1]                  341     341      - TS   16
irq/148-megasas [irq/148-megasas0-msix17]       505     505     50 FF   16
irq/254-i40e-00 [irq/254-i40e-0000:18:00.1:    1597    1597     50 FF   16
irq/238-i40e-en [irq/238-i40e-ens1f1-TxRx-8    3800    3800     50 FF   16
irq/188-i40e-en [irq/188-i40e-ens1f0-TxRx-2    3977    3977     50 FF   16
qemu-system-x86 /usr/bin/qemu-system-x86_64    5899    5899      - TS   16
qemu-system-x86 /usr/bin/qemu-system-x86_64    5899    5907      - TS   16
log             /usr/bin/qemu-system-x86_64    5899    5927      - TS   16
msgr-worker-0   /usr/bin/qemu-system-x86_64    5899    5928      - TS   16
msgr-worker-1   /usr/bin/qemu-system-x86_64    5899    5929      - TS   16
msgr-worker-2   /usr/bin/qemu-system-x86_64    5899    5930      - TS   16
service         /usr/bin/qemu-system-x86_64    5899    5934      - TS   16
io_context_pool /usr/bin/qemu-system-x86_64    5899    5935      - TS   16
io_context_pool /usr/bin/qemu-system-x86_64    5899    5936      - TS   16
ceph_timer      /usr/bin/qemu-system-x86_64    5899    5937      - TS   16
ms_dispatch     /usr/bin/qemu-system-x86_64    5899    5938      - TS   16
ms_local        /usr/bin/qemu-system-x86_64    5899    5939      - TS   16
safe_timer      /usr/bin/qemu-system-x86_64    5899    5940      - TS   16
safe_timer      /usr/bin/qemu-system-x86_64    5899    5941      - TS   16
safe_timer      /usr/bin/qemu-system-x86_64    5899    5942      - TS   16
safe_timer      /usr/bin/qemu-system-x86_64    5899    5943      - TS   16
taskfin_librbd  /usr/bin/qemu-system-x86_64    5899    5944      - TS   16
vhost-5899      /usr/bin/qemu-system-x86_64    5899    5952      - TS   16
IO mon_iothread /usr/bin/qemu-system-x86_64    5899    5953      - TS   16
SPICE Worker    /usr/bin/qemu-system-x86_64    5899    5982      - TS   16
vhost-5899      /usr/bin/qemu-system-x86_64    5899    6003      - TS   16
kworker/16:1H-k [kworker/16:1H-kblockd]        5906    5906      - TS   16
kvm-nx-lpage-re [kvm-nx-lpage-recovery-5899    5945    5945      - TS   16
cpuhp/17        [cpuhp/17]                      191     191      - TS   17
idle_inject/17  [idle_inject/17]                192     192     50 FF   17
irq_work/17     [irq_work/17]                   193     193      1 FF   17
migration/17    [migration/17]                  194     194     99 FF   17
rcuc/17         [rcuc/17]                       195     195     10 FF   17
ktimers/17      [ktimers/17]                    196     196      1 FF   17
ksoftirqd/17    [ksoftirqd/17]                  197     197      - TS   17
kworker/17:0-ev [kworker/17:0-events]           198     198      - TS   17
kworker/17:1    [kworker/17:1]                  342     342      - TS   17
irq/149-megasas [irq/149-megasas0-msix18]       506     506     50 FF   17
irq/239-i40e-en [irq/239-i40e-ens1f1-TxRx-9    3801    3801     50 FF   17
irq/166-i40e-en [irq/166-i40e-ens1f0-TxRx-0    3955    3955     50 FF   17
irq/189-i40e-en [irq/189-i40e-ens1f0-TxRx-2    3978    3978     50 FF   17
CPU 2/KVM       /usr/bin/qemu-system-x86_64    5899    5957      5 FF   17
cpuhp/18        [cpuhp/18]                      201     201      - TS   18
idle_inject/18  [idle_inject/18]                202     202     50 FF   18
irq_work/18     [irq_work/18]                   203     203      1 FF   18
migration/18    [migration/18]                  204     204     99 FF   18
rcuc/18         [rcuc/18]                       205     205     10 FF   18
ktimers/18      [ktimers/18]                    206     206      1 FF   18
ksoftirqd/18    [ksoftirqd/18]                  207     207      - TS   18
kworker/18:0-ev [kworker/18:0-events]           208     208      - TS   18
kworker/18:1    [kworker/18:1]                  343     343      - TS   18
irq/151-megasas [irq/151-megasas0-msix19]       507     507     50 FF   18
irq/240-i40e-en [irq/240-i40e-ens1f1-TxRx-1    3802    3802     50 FF   18
irq/167-i40e-en [irq/167-i40e-ens1f0-TxRx-1    3956    3956     50 FF   18
CPU 3/KVM       /usr/bin/qemu-system-x86_64    5899    5958      5 FF   18

What's the emulatorpin setting in your setup?

dupremathieu commented 7 months ago

What we want for emulatorpin is to use the no-rt cpuset (0-1,7-13,19-N in your case) to avoid reserve and loose a core for it.

insatomcat commented 7 months ago

I don't think you can do that while at the same time asking libvirt to use the machine-rt slice for the same guest

dupremathieu commented 7 months ago

It should be possible using qemu hook, but to me, we should just mention it in the documentation.

I suggest indicating in the documentation:

insatomcat commented 7 months ago

If you use RT privilege KVM threads, the minimal cpuset size has to be: total of number of vcpus + 1

--> total of number of "isolated" vcpus + 1

You don't have to isolate all the vcpu. You may even need a "non isolated" vcpu for the housekeeping inside the vm. So basically if you need N isolated cores for your realtime workload, your guest may need N+1 cores. But this "+1" can be shared with other RT guests and with the emulator.

eroussy commented 7 months ago

I suggest indicating in the documentation:

  • The pinned processes have to be pinned inside the cgroup cpuset.
  • If you use RT privilege KVM threads, the minimal cpuset size has to be: total of number of vcpus + 1.
  • If it is not what you want, do not define any cgroup cpuset and use isolcpus domain instead.

I don't think it is a good idea to propose both. We should choose one isolation method and argument why we choosed it.

Personally,

The only argument I see (for now) in favor of using cgroups is that it prevents from hardware processor attacks like meltdown and spectre. Do we really want to prevent these types of attacks ? Is there any other argument I'm missing ?

eroussy commented 7 months ago

Hi all,

We need to close this question. I discussed with Mathieu and we conclude that :

So, regarding the work to do :

@insatomcat @dupremathieu what do you think of that ? Did I miss something ?

insatomcat commented 7 months ago

I'm ok with all that.

ebail commented 7 months ago

Great. Maybe @eroussy it worth documenting it on LFEnergy Wiki ?

eroussy commented 7 months ago

The topic is now covered in this wiki page : https://wiki.lfenergy.org/display/SEAP/Scheduling+and+priorization Feel free to reopen if you have questions or remarks.