pinggit / dpdk-contrail-book

contrail dpdk day one book
9 stars 3 forks source link

regarding --socket-mem 1024,1024 #3

Open pinggit opened 4 years ago

pinggit commented 4 years ago

isn't --socket-mem to allocate hugepage for vrouter only (not to VM)? or, this is actually a same "global" system-wise parameter, which applies to both vrouter and VM , just as the kernel hugepagesz=1G hugepages=40 parameter?

pinggit commented 4 years ago

answer from Laurent:

Answer: This is also something we have to explain clearly. Hugepages use in DPDK are really badly explained. And, once again this is not so complex as it seems.

You just have to keep in mind that :

(only descriptors are moving from one Q to another)

Then, you have DPDK setup :

So, in a short, you have instances running on both NUMA. They have to be able access packets that are referenced by descriptors (that vrouter as put in vNIC RX queue).

This is why, by default we spread hugepage memory allocation on both NUMA.

image7

Here, is shown how huge pages are used.

So, first you are allocating HugePage at system level (at startup for 1G huge pages):

default_hugepagesz=1GB hugepagesz=1G hugepages=40 hugepagesz=2M hugepages=40

I guess, that Huge pages are equally balanced on both NUMA (to be checked)

Then you are requesting at vrtouter level a part of them for vrouter DPDK application need (to store both underlay and VM packets):

--socket-mem <value>,<value>
pinggit commented 4 years ago

@ldurandadomia , with these explained, it also indicates: for those VMs spawn in a different NUMA1 than where VROUTER is running (say NUMA0 in this example), the performance will be lower due to QPI slowness? so for time sensitive VM we have to spawn them in NUMA0 only?

ldurandadomia commented 4 years ago

@pinggit Yes, this is a matter of cost versus performance. If you put everything on a single NUMA you'll exhaust very quickly all available cores on this NUMA. Let's take a an example. With a CPU having 18 cores per NUMA (it makes 36 with siblings)

If you are starting DPDK VM with at least 8 vCPU (at least the same number of CPU as vRouter to get the same number of Q on VM side), you can only spawn 3 VM.

This is really poor and not realistic for most of customers ....

ldurandadomia commented 4 years ago

@pinggit Idea is more:

Idea is to avoid part of the traffic of VM (or vrouter) to be processed on a first NUMA and other part on the second one (it would create internal delays, reordering, ...).

Next, if on a given host compute, you have one VNF that is requiring more performance that the others, it's probably clever to pin it on the same NUMA than vrouter.

You also have to pay attention to use physical NIC that are pinned on the same NUMA as the vrouter (this is hard coded into the PCI slot - so if a NIC is not on the appropriate NUMA you have to use another NIC or move the NIC on another slot).

pinggit commented 4 years ago

thanks @ldurandadomia . one more thing:

If you are starting DPDK VM with at least 8 vCPU (at least the same number of CPU as vRouter to get the same number of Q on VM side), you can only spawn 3 VM.

  1. in practice, I believe VMs are sharing cores among each other. so you can spawn 10 VMs with each using same 4 cores like 3,4,5,6. - if the performance is not a big deal for these VMs?

  2. VM Qs are determined by vCPU assigned to the VM, not the other way around. so you assign 2 vCPU to a VM, the VM will have 2 queues per each vNIC. you don't have to assign 8 vCPU. correct me if I'm wrong. I can open a new issue on this topic.

ldurandadomia commented 4 years ago

1 - in practice, I believe VMs are sharing cores among each other. so you can spawn 10 VMs with each using same 4 cores like 3,4,5,6. - if the performance is not a big deal for these VMs?

Not really ... If you are building a VNF (virtual network function) DPDK application. You have the same concern on the VNF has on the vrouter DPDK application, You want performance and you do not want to share the allocated CPU !!!

This is why, when Contrail DPDK is used:

ldurandadomia commented 4 years ago

2 - VM Qs are determined by vCPU assigned to the VM, not the other way around. so you assign 2 vCPU to a VM, the VM will have 2 queues per each vNIC. you don't have to assign 8 vCPU. correct me if I'm wrong. I can open a new issue on this topic.

This is defined by the VM setup (libvirt XML configuration file). But with OpenStack, an implicit rule is used to configure NIC Queues. This implicit rule is configuring the same number of Q on each vNIC as the number of CPU defined on the VM.

No way to configure is differently. It is nevertheless possible to reduce this number of Q after VM startup using ethtool command.