A Flexible and Efficient Container-based NFV Platform for Middlebox Networking (ACM SAC 2018)

Two weeks from now will be the sixth in our series of live paper discussion groups. This will be the last one of these that I organize for a while.

Sunday, 2023-09-17 13:00–14:00

"A Flexible and Efficient Container-based NFV Platform for Middlebox Networking" 郑超 (Zheng Chao), 陆秋文 (Lu Qiuwen), 刘庆云 (Liu Qingyun), 李佳 (Li Jia), 方滨兴 (Fang Binxing) PDF

The NFV in the title means network function virtualization, which refers to virtualizing multiple network tasks (e.g. DPI, firewall, IDS) on one piece of general-purpose hardware, rather than having specialized hardware for each separate task. This paper proposes an NFV platform called MVMP (Multiple Virtual Middlebox Platform) based on Docker containers, Linux, and DPDK.

The paper is not explicitly about censorship, except for one place that talks about TCP RST injection:

The IPS (NF3 [network function 3]) builds a TCP reset packet with a buffer in shared memory and injects the packet through VD3 [virtual device 3] to stop malicious connections.

This paper is cited in the "Trampled Orchid" report (PDF), which is about changes in Internet restrictions in Hong Kong following the passage of the national security law in 2020. Section 5.7.1 is about network function virtualization, and mentions Fang Binxing's lab Pengcheng Laboratory (Baike article (Chinese)).

To avoid overreliance on hardware specifications and the accompanying slowdowns, network researchers from around the world have increasingly pushed for the virtualization of network functions (or VNFs, virtual network functions).⁶³¹ Network Function Virtualization (NFV) separates network functions from physical hardware, using software-based networking components—the VNFs—that can be moved, instantiated, or updated without changing the existing hardware.⁶³² The goal of NFV is to move beyond the physical hardware necessary to comprise a network—the message router, CDN, session border controller, DPI, Firewall, and more—to base networks on only the hardware of servers, storage, and switches, which will then run virtual versions of network components.⁶³³

The transition from hardware-based censorship to software-based censorship has required significant research investment within China. China's censorship system appears to currently rely on hardware - specifically, the use of IDSes and DPIs at IXPs, on the "Internet backbone," and on edge routers.⁶³⁴ The transition to NFV requires ways to virtualize DPIs and other "middleboxes"—intermediary devices that perform functions other than moving traffic from host to destination, like inspecting, filtering, or altering traffic.⁶³⁵

Inspecting and filtering traffic in a virtualized network has been an area of significant research for Chinese Internet scholars, including Fang Binxing, the "Father of the Great Firewall."^{636 [this paper]} Fang Binxing has worked on incorporating middleboxes into virtualized networks, describing the creation a traffic filtering system that mimics the way that Deep Packet Inspection currently works on China's Internet. China's filtering system relies on an "on-path" system rather than "in-path barriers;" filtering routers send copies of traffic to out-of-band inspection devices, while allowing the packets to continue directly to the user. The copies are then inspected, and the content is compared to a government keyword and URL blacklist. If the inspection technology finds blacklisted content, the router will inject forged TCP resets, severing the connection and blocking the user from reconnecting to the same IP address.637 Fang Binxing's proposed virtualized network system replicates the out-of-band inspection and subsequent TCP reset, making explicit that the government-sponsored research on proposed middleboxes is intended to virtualize existing censorship systems.^{638 [this paper]} Other researchers have published on the same topic, including from elite research institutions like Tsinghua University.⁶³⁹

Fang Binxing's new lab, the Pengcheng Internet Laboratory, specializes in NFV censorship and content inspection tools, among other areas.⁶⁴⁰ The lab was founded by the Shenzhen city government to meet national innovation needs, and represents an experiment with a new kind of state-backed laboratory.⁶⁴¹ In a presentation at the 8th National Internet and Information Security Defense Summit (XDef, 第八届全国网络与信息安全防护峰会), the Pengcheng Lab presented their new "Cyber Range," which could be used to test out new technologies against various cyber attacks. This included tools for testing virtualized content inspection engines based on DPIs.⁶⁴² The Pengcheng lab designs its own virtualized content inspection systems, complete with the out-of-band filtering system.⁶⁴³ This research focus, coming out of a state-backed lab, indicates a national interest in virtualized censorship tools.

A Flexible and Efficient Container-based NFV Platform for Middlebox Networking 郑超 (Zheng Chao), 陆秋文 (Lu Qiuwen), 刘庆云 (Liu Qingyun), 李佳 (Li Jia), 方滨兴 (Fang Binxing) https://dl.acm.org/doi/abs/10.1145/3167132.3167240

This paper describes a platform for Network Function Virtualization (NFV) called MVMP (Multiple Virtual Middlebox Platform) that is based on Linux, DPDK, and Docker. Multiple network functions, such as routing, load balancing, IDS, firewall, and DPI can all be implemented in software, rather than hardware, and can coexist on the same general-purpose device. The authors say that MVMP "particularly addresses the needs of DPIs and IDSes."

The platform has three main goals:

Flexibility: Current-day data centers are often virtualized, with raw packets being encapsulated in an additional protocol layer, such as VXLAN or NVGRE. This imposes restrictions on where traditional hardware middleboxes such as DPIs may be placed: since they expect to see raw unencapsulated packets, they can only be installed at place where the additional encapsulation layer is not present. MVMP makes it possible to install such network functions anywhere in the topology, by installing another virtual function just upstream of them to do decapsulation.
Efficiency: MVMP aims for zero-copy packet transmission between virtual network functions. Communication is accomplished using shared memory and ring buffers.
Robustness: One network function can crash without taking down the whole system, and memory protection can prevent one network function from overwriting memory used by another.

The basic motivation for a platform like MVMP is that while high-performance packet processing technologies like DPDK exist, they are too low-level to be convenient to be used directly for sharing packets between multiple cooperating virtual network functions. Just as an operating system abstracts and mediates access to global resources like RAM and I/O, MVMP and similar platforms act as a hypervisor that abstracts the access to hardware network interfaces provided by DPDK. A basic building block in MVMP is the virtual network interface. Each virtual network function acts as if it has its own network interface for receiving and sending packets. Virtual network interfaces may correspond to one or more hardware network interfaces, or they may exist as purely in-memory abstractions. The MVMP hypervisor dispatches incoming packets to virtual network devices according to a table of dispatching policies (see an example in Table 1). Inside the virtualized topology, network functions can be wired up arbitrarily by connecting the send and receive queues of their virtual network interfaces. Section 3 gives an example application: a DPI function receives packets from a hardware network interface, then forwards a subset of them to an IPS function, which "builds a TCP reset packet with a buffer in shared memory and injects the packet to stop malicious connections."

Communication between network functions is based heavily on shared memory and ring buffers pointing into it. In a more simplistic system, the platform would allocate space for incoming packets in shared memory, index them in a ring buffer; then a network function would read packets from the ring buffer and return the packet references when finished in order to reclaim the allocated memory. But this model has a problem when a network function crashes: the memory it was referencing at the time will not be reclaimed—or worse, the global memory pool may be corrupted, and the whole system will crash. MVMP adds an additional ring buffer per network function. Each network function uses its own private ring buffer, rather than operating on the global pool directly. That way, crashes and out-of-bounds writes are isolated. The additional ring buffers add a small amount of overhead but greatly increase robustness. Network functions also have a small amount of writable private memory that they can use to craft new packets (as in the RST injection example above).

The evaluation compares MVMP to SR-IOV and OpenNetVM. (OpenNetVM is also based on Docker and DPDK.) According to experiments done by the authors, the performance of SR-IOV and OpenNetVM decreases as more network functions are added in series, while MVMP remains more or less constant (Figure 7). The authors claim a maximum performance increase of 7× versus SR-IOV and 3× versus OpenNetVM.

I want to draw a connection to two other publications: the OpenNetVM paper from 2016, and patent CN103514053, which is about efficient communication using shared memory ring buffers.

OpenNetVM has a lot in common with MVMP. Both are based on Linux, DPDK, and Docker. See how similar are OpenNetVM's Figure 1 and MVMP's Figure 2:

Figure 1: The NV Manager creates a shared memory region to store packets and meta data such as the flow table and service chain lists. Packets are moved between NFs by RX and TX threads that copy packet descriptors into an NF's receive (R) and transmit (T) ring buffers. NFs run in isolated containers that encapsulate all dependencies.

Figure 2: An overview of MVMP.

Just from the texts, it's hard to say what exactly makes MVMP different from OpenNetVM. The MVMP authors explain it this way:

OpenNetVM [14] was designed for high-perfomance service chains and deploys NFs in containers. The difference between OpenNetVM and MVMP lies in how they steer traffic. OpenNetVM uses a flow table and an additional thread to move packets through NFs. In contrast, MVMP provides a virtual device abstraction that allows NFs to define their own input and output and thus connect with each other. In a service chain that combines inline NFs and bypass NFs, the virtual device abstraction is more concise than a flow table.

MVMP claims their virtual network interfaces as an innovation, but to me it's not obvious how they are any different from OpenNetVM's own RX and TX ring buffers. MVMP claims that the flow table in OpenNetVM slows things down, but the flow table in OpenNetVM is an optional convenience feature, and direct wiring of RX and TX buffers is still possible. The MVMP authors know this because in the evaluation of Section 4.1 they say, "to make a fair comparison, we disabled its flow table." One advantage that really seems to be specific to MVMP is traffic duplication; i.e., having multiple network function read from the same network interface, each reader getting a copy of all the packets:

MVMP facilitates traffic sharing by allowing multiple NFs to open the same virtual device, with a packet descriptor distributed to each NF’s lockless rings. Since OpenNetVM does not support traffic duplication, only MVMP was evaluated for this case.

MVMP also has a control plane based on unix domain sockets, which, (checking the source code) OpenNetVM doesn't have.

Patent CN103514053 from 2013 is about efficient communication between multiple processes using a shared-memory ring buffer (or "circular queue"). The title is 一种基于共享内存的进程间通讯方法, or in English, "Shared-memory-based method for conducting communication among multiple processes". It's hard to judge by the bad translation available at Google Patents, but it seems to be relevant to the memory-sharing features of MVMP. The MVMP paper and patent CN103514053 have two authors in common:

MVMP	CN103514053
郑超 (Zheng Chao)	郑超 (Zheng Chao)
刘庆云 (Liu Qingyun)	刘庆云 (Liu Qingyun)
陆秋文 (Lu Qiuwen)
李佳 (Li Jia)
方滨兴 (Fang Binxing)
李世明 (Li Shiming)
刘洋 (Liu Yang)
秦鹏 (Qin Peng)
孙永 (Sun Yong)
周舟 (Zhou Zhou)
杨威 (Yang Wei)

On the other hand, the MVMP paper only talks about using rte_ring (e.g. "For a single consumer and single producer scenario, rte_ring is lockless"), so maybe they only use off-the-shelf libraries for the shared-memory ring buffers.

The reading group for "Understanding the Network Traffic Constraints" will begin 20 hours from now at 2023-09-17 13:00.

https://jitsi.rinsed-tinsel.site/bbs-reading-group-20230917

It's on a different meeting server than usual because meet.jit.si started requiring authentication.

I will open the call about 20 minutes early to give time for troubleshooting connection issues. As usual, a video recording will be made available afterward.

Link to video

Links that were posted in the chat:

01:30: Discussion thread
01:35: Paper PDF
02:35: RST injection of 1.1.1.1:443 in China since 2023-09-05
03:00: "Understanding the Network Traffic Constraints for Deep Packet Inspection by Passive Measurement"
03:20: "Information content security on the Internet: the control model and its evaluation"
22:05: OpenNetVM home page
22:09: "OpenNetVM: A Platform for High Performance Network Service Chains"
27:49: CN103514053: 一种基于共享内存的进程间通讯方法
27:49: CN103514053: Shared-memory-based method for conducting communication among multiple processes
31:59: "Trampled Orchid: Internet freedom in Hong Kong"
36:55: XDef 2019 agenda (click the tab "12月14>日（2/2）")
37:16: XDef 2019 speakers (scroll to 丁勇 (Ding Yong))

We talked more about the comparison between MVMP (the subject of this paper) and OpenNetVM. Figure 2 of the OpenNetVM paper shows performance not diminishing much with increasing service chain length:

Figure 2: OpenNetVM achieves high throughput, even when running through long service chains, by avoiding expensive packet copies and system calls.

Whereas the experiments in the MVMP paper do show a decrease in performance in OpenNetVM:

Figure 7: Service chain performance, 64-byte packet.

Another graph on the OpenNetVM home page shows that the decline in peformance depends on whether inter-NF communcation is direct or indirect:

NF Chain Throughput (64B packets)

Our measurements at left show that a chain of length two using our NF Direct communication mechanism has a maximum throughput of 32 million packets per second, while extending the chain to seven NFs only incurs a 10% throughput drop. Using indirect NF communication via the management layer sees decreasing performance as the manager’s TX thread becomes a bottleneck.

We also talked about a conference presentation mentioned in the Trampled Orchid report at the 2019 XDef conference, by 丁勇 (Ding Yong), on the topic of Pengcheng Internet Laboratory and its "cyber range" testbed for evaluating content inspection engines. I have not been able to find a link to a video recording of the conference presentation, though there is some information about the talk on the XDef site.

http://www.xdef.com.cn/xdef/web/conferences?year=2019&page=agenda (archive) (click on "12月14日（2/2）")

议题：网络靶场结构与关键技术-鹏城实验室国家级网络靶场

丁勇桂林电子科技大学教授，鹏城实验室双聘研究员

Topic: Cyber Range Structure and Key Technologies-National Cyber Range of Pengcheng Laboratory

Ding Yong Professor, Guilin University of Electronic Science and Technology, Dual Research Fellow, Pengcheng Laboratory

http://www.xdef.com.cn/xdef/web/conferences?year=2019&page=speakers (archive)

Screenshot of Ding Yong's profile, with caption as appears below.

丁勇桂林电子科技大学计算机与信息安全学院教授

丁勇，男，博士，教授，博导，桂林电子科技大学计算机与信息安全学院院教授，鹏城实验室双聘研究员，广西密码学与信息安全重点实验室主任，广西高层次E层次人才二十余个国际国内知名会议主席或者程序委员，CCF C期刊PPNA副主编，SCI期刊EJIS客座编辑，在多个国际国内会议论坛做主题报告或者圆桌嘉宾。主要从事公钥密码理论、同态加密、密码安全协议、区块链及其应用等方面研究。中国密码学会组织委员会委员、高级会员，中国计算机学会区块链专委会发起委员、常务委员，计算机应用委员会委员，中国网络空间安全人才教育联盟理事。

演讲议题：网络靶场结构与关键技术-鹏城实验室国家级网络靶场 议题简介：主要介绍了网络靶场的意义、国内外发展现状，系统架构与关键技术。并着重介绍了鹏城实验室国家级网络靶场的建设方案、发展规划以及建设成果。

Ding Yong Professor, School of Computer and Information Security, University of Electronic Science and Technology of Guilin, China

Dr. Ding Yong, Ph.D., Professor, Doctor of Philosophy, Professor of School of Computer and Information Security, University of Electronic Science and Technology of Guilin, Director of Guangxi Key Laboratory of Cryptography and Information Security, Director of Guangxi Key Laboratory of Cryptography and Information Security, Chairman of more than 20 international and domestic conferences and programs, Associate Editor of CCF C journal PPNA, Guest Editor of SCI journal EJIS, keynote speaker or round-table guest at many international and domestic conferences and forums. He has also been a guest editor of SCI journal EJIS, and has been a keynote speaker or roundtable guest at many international and domestic conferences and forums. He is mainly engaged in the research of public key cryptography theory, homomorphic encryption, cryptographic security protocols, blockchain and its applications. He is a member of Organizational Committee and Senior Member of China Cryptologic Society, an initiator and standing member of Blockchain Special Committee of China Computer Society, a member of Computer Application Committee, and a director of China Cyberspace Security Talent Education Alliance.

Topic: Cyber Range Structure and Key Technology-National Cyber Range of Pengcheng Laboratory Topic Introduction: It mainly introduces the significance of the cyber range, the current development situation at home and abroad, the system architecture and key technologies. It also focuses on the construction program, development plan and results of the national cyber range of Pengcheng Laboratory.

We also talked about a conference presentation mentioned in the Trampled Orchid report at the 2019 XDef conference, by 丁勇 (Ding Yong), on the topic of Pengcheng Internet Laboratory and its "cyber range" testbed for evaluating content inspection engines. I have not been able to find a link to a video recording of the conference presentation, though there is some information about the talk on the XDef site.

While I still haven't found a video, the speakers page links to PDF slides. Here are the slides of Ding Yong's presentation (Chinese):

https://archive.org/details/XDef2019Slides/XDef2019-网络靶场结构与关键技术-鹏城实验室国家级网络靶场-丁勇/

This is the overview of topics:

鹏城实验室介绍

网络靶场概念、背景意义与需求

网络靶场定义和业务流程

科学问题和关键技术

已有基础及研究进展

网安人才实训探索

Introduction to Pengcheng Laboratory

Cyber range concept, background, significance, and requirements

Cyber range definition and business processes

Scientific questions and key technologies

Fundamentals and research progress

Exploration of practical training of cybersecurity talent

net4people / bbs

A Flexible and Efficient Container-based NFV Platform for Middlebox Networking (ACM SAC 2018) #282