tbarbette / fastclick

FastClick - A faster version of the Click Modular Router featuring batching, advanced multi-processing and improved Netmap and DPDK support (ANCS'15). Check the metron branch for Metron specificities (NSDI'18). PacketMill modifications (ASPLOS'21) as well as MiddleClick(ToN, 2021) are merged in main.
Other
280 stars 81 forks source link

Fastclick + LXD/LXC + DPDK #47

Closed mrb1090 closed 6 years ago

mrb1090 commented 7 years ago

Greetings.

Been running plain vanilla click elements within LXD/LXC containers for a bit now. Curious if anyone has successfully run FastClick within an LXC container w/DPDK installed on the host. If so, any pointers with respect to configuration would be greatly appreciated.

Thanks,

Mike B.

tbarbette commented 7 years ago

Hi ! I know my colleague @cffs has tried some things with DPDK in containers and he may help you after returning from his holidays. In itself, Click has no problem with containers, however DPDK does. You may want to search about DPDK and containers, there is nothing to look specific about (Fast)Click on the subject. If you're looking into containers I guess you want proper isolation, and the only way to achieve fast isolation staying secure (that is not sharing memory between containers) is using SRIOV and virtual functions so each container can have its own virtual device.

tbarbette commented 7 years ago

Poke @cffs ?

mrb1090 commented 7 years ago

Thanks Tom. Still working through this.

BTW, what kind of performance are you seeing these days w/FastClick using DPDK?

Mike

tbarbette commented 7 years ago

I would say the overhead over DPDK is removed. In some situation we can do even better than pure DPDK apps, as the click Packet object pool recycle annotation space leading to a better cache locality than using DPDK's buffers that are cycling through at least the ring sizes. This is discussed in the main README.

For minimal size packets (UDP 64bytes) results from FastClick haven't changed much (that's a great news :p). However using newer NICs such as the X710/XL710 chipsets, a single core can do ~23G routing of 64bytes packets and those cards can actually achieve 40G throughput as they use PCIe gen 3, unlike the spikes you can see in the fastclick paper with small packet sizes using dual-ports 10G cards that bootleneck the PCIE Gen2 x8.

With a real trace we can do a 40G full-duplex stateful function with a single xeon core and an Intel XL710. Those are very efficient in term of CPU overhead as I already said. However in classification capacity Intel is clearly moving in the wrong direction, as it is worst than before while NFV requires more. I recently played with a 100G NIC that can be fully maxed out to hardware max (which seemed to be ~160G/s full duplex) with 2 cores and a stateful function, that is still with real traces (so mix of packet sizes).

I would say current bottlenecks are the number of PCIe line per CPU (hence, the number of NICs), in the end, an I/O bottleneck. Except for some very very heavy function like DPI that goes back to CPU bound. Although there are some research to do in the area and we have (pending publication) good early results that send back the bottleneck to I/O in most cases.

kthkaya commented 6 years ago

I think the discussion going on in #55 may be helpful.

mrb1090 commented 6 years ago

Thanks for the reference!

Mike B.

tbarbette commented 6 years ago

I think we can close this as #55 made it run. If you still have problem please reopen.