iqiyi / dpvs

DPVS is a high performance Layer-4 load balancer based on DPDK.
Other
3.04k stars 730 forks source link
balancer dpdk fullnat ipv6 kernel-bypass load-balancer lvs nat64 snat

Build Run

dpvs-logo.png

Introduction

DPVS is a high performance Layer-4 load balancer based on DPDK. It's derived from Linux Virtual Server LVS and its modification alibaba/LVS.

Notes: The name DPVS comes from "DPDK-LVS".

dpvs.png

Several techniques are applied for high performance:

Major features of DPVS including:

DPVS consists of the modules illustrated in the diagram below.

modules

Quick Start

Test Environment

This quick start is performed in the environments described below.

Other environments should also be OK if DPDK works, please check dpdk.org for more information.

Notes:

  1. Please check this link for NICs supported by DPDK: http://dpdk.org/doc/nics.
  2. Flow Control (rte_flow) is required for FNAT and SNAT mode when DPVS running on multi-cores unless conn redirect is enabled. The minimum requirements to ensure DPVS works with multi-core properly is that rte_flow must support "ipv4, ipv6, tcp, udp" four items, and "drop, queue" two actions.
  3. DPVS doesn't confine itself to the this test environments. In fact, DPVS is an user-space application which relies very little on operating system, kernel versions, compilers, and other platform discrepancies. As far as is known, DPVS has been verified at least in the following environments.
    • Centos 7.2, 7.6, 7.9
    • Anolis 8.6, 8.8, 8.9
    • GCC 4.8, 8.5
    • Kernel: 3.10.0, 4.18.0, 5.10.134
    • NIC: Intel IXGBE, NVIDIA MLX5

Clone DPVS

$ git clone https://github.com/iqiyi/dpvs.git
$ cd dpvs

Well, let's start from DPDK then.

DPDK setup

Currently, dpdk-stable-20.11.10 is recommended for DPVS, and we will not support dpdk version earlier than dpdk-20.11 any more. If you are still using earlier dpdk versions, such as dpdk-stable-17.11.6 and dpdk-stable-18.11.2, please use earlier DPVS releases, such as v1.8.12.

Notes: You can skip this section if experienced with DPDK, and refer the link for details.

$ wget https://fast.dpdk.org/rel/dpdk-20.11.10.tar.xz   # download from dpdk.org if link failed.
$ tar xf dpdk-20.11.10.tar.xz

DPDK patchs

There are some patches for DPDK to support extra features needed by DPVS. Apply them if needed. For example, there's a patch for DPDK kni driver for hardware multicast, apply it if you are to launch ospfd on kni device.

Notes: It's assumed we are in DPVS root directory where you have installed dpdk-stable-20.11.10 source codes. Please note it's not mandatory, just for convenience.

$ cd <path-of-dpvs>
$ cp patch/dpdk-stable-20.11.10/*.patch dpdk-stable-20.11.10/
$ cd dpdk-stable-20.11.10/
$ patch -p1 < 0001-kni-use-netlink-event-for-multicast-driver-part.patch
$ patch -p1 < 0002-pdump-change-dpdk-pdump-tool-for-dpvs.patch
$ ...

Tips: It's advised to patch all if your are not sure about what they are meant for.

DPDK build and install

Use meson-ninja to build DPDK, and export environment variable PKG_CONFIG_PATH for DPDK application (DPVS). The sub-Makefile src/dpdk.mk in DPVS will check the presence of libdpdk.

$ cd dpdk-stable-20.11.10
$ mkdir dpdklib                 # user desired install folder
$ mkdir dpdkbuild               # user desired build folder
$ meson -Denable_kmods=true -Dprefix=dpdklib dpdkbuild
$ ninja -C dpdkbuild
$ cd dpdkbuild; ninja install
$ export PKG_CONFIG_PATH=$(pwd)/../dpdklib/lib64/pkgconfig/

Tips: You can use script dpdk-build.sh to facilitate dpdk build. Run dpdk-build.sh -h for the usage of the script.

Next is to set up DPDK hugepage. Our test environment is NUMA system. For single-node system please refer to the link.

$ # for NUMA machine
$ echo 8192 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
$ echo 8192 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

By default, hugetlbfs is mounted at /dev/hugepages, as shown below.

$ mount | grep hugetlbfs
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)

If it's not your case, you should mount hugetlbfs by yourself.

$ mkdir /mnt/huge
$ mount -t hugetlbfs nodev /mnt/huge

Notes:

  1. Hugepages of other size, such as 1GB-size hugepages, can also be used if your system supports.
  2. It's recommended to reserve hugepage memory and isolate CPUs used by DPVS with linux kernel cmdline options in production environments, for example isolcpus=1-9 default_hugepagesz=1G hugepagesz=1G hugepages=32.

Next, install kernel modules required by DPDK and DPVS.

$ modprobe uio_pci_generic

$ cd dpdk-stable-20.11.10
$ insmod dpdkbuild/kernel/linux/kni/rte_kni.ko carrier=on

$ # bind eth0 to uio_pci_generic (Be aware: Network on eth0 will get broken!)
$ ./usertools/dpdk-devbind.py --status
$ ifconfig eth0 down          # assuming eth0's pci-bus location is 0000:06:00.0
$ ./usertools/dpdk-devbind.py -b uio_pci_generic 0000:06:00.0

Notes:

  1. The test in our Quick Start uses only one NIC. Bind as many NICs as required in your DPVS application to DPDK driver kernel module. For example, you should bind at least 2 NICs if you are testing DPVS with two-arm.
  2. dpdk-devbind.py -u can be used to unbind driver and switch it back to Linux driver like ixgbe. Use lspci or ethtool -i eth0 to check the NIC's PCI bus-id. Please refer to DPDK Doc:Binding and Unbinding Network Ports to/from the Kernel Modules for more details.
  3. NVIDIA/Mellanox NIC uses bifurcated driver which doesn't rely on UIO/VFIO driver, so not bind any DPDK driver kernel module, but NVIDIA MLNX_OFED/EN is required. Refer to Mellanox DPDK for its PMD and Compilation Prerequisites for OFED installation.
  4. A kernel module parameter carrier has been added to rte_kni.ko since DPDK v18.11, and the default value for it is "off". We need to load rte_kni.ko with extra parameter carrier=on to make KNI devices work properly.
  5. Multiple DPVS instances can run on a single server if there are enough NICs or VFs within one NIC. Refer to tutorial:Multiple Instances for details.

Build DPVS

It's simple, just set PKG_CONFIG_PATH and build it.

$ export PKG_CONFIG_PATH=<path-of-libdpdk.pc>  # normally located at dpdklib/lib64/pkgconfig/
$ cd <path-of-dpvs>

$ make              # or "make -j" to speed up
$ make install

Notes:

  1. Build dependencies may be needed, such as pkg-config(version 0.29.2+, automake, libnl3, libnl-genl-3.0, openssl, popt and numactl. You can install the missing dependencies with package manager of your system, e.g., yum install popt-devel automake (CentOS) or apt install libpopt-dev autoconfig (Ubuntu).
  2. Early pkg-config versions (v0.29.2 before) may cause dpvs build failure. If so, please upgrade this tool. Specially, you may upgrade the pkg-config on Centos7 to meet the version requirement.
  3. If you want to compile dpvs-agent and healthcheck, enable CONFIG_DPVS_AGENT in config.mk, and install Golang build environments(Refer to go.mod file for required Golang version).

Output binary files are installed to dpvs/bin.

$ ls bin/
dpip  dpvs  dpvs-agent  healthcheck  ipvsadm  keepalived

Launch DPVS

Now, dpvs.conf must locate at /etc/dpvs.conf, just copy it from conf/dpvs.conf.single-nic.sample.

$ cp conf/dpvs.conf.single-nic.sample /etc/dpvs.conf

and start DPVS,

$ cd <path-of-dpvs>/bin
$ ./dpvs &

$ # alternatively and strongly advised, start DPVS with NIC and CPU explicitly specified:
$ ./dpvs -- -a 0000:06:00.0 -l 1-9

Notes:

  1. Run ./dpvs --help for DPVS supported command line options, and ./dpvs -- --help for common DPDK EAL command line options.
  2. The default dpvs.conf require 9 CPUs(1 master worker, 8 slave workers), modify it if not so many available CPUs in your system.

Check if it's get started ?

$ ./dpip link show
1: dpdk0: socket 0 mtu 1500 rx-queue 8 tx-queue 8
    UP 10000 Mbps full-duplex fixed-nego promisc-off
    addr A0:36:9F:9D:61:F4 OF_RX_IP_CSUM OF_TX_IP_CSUM OF_TX_TCP_CSUM OF_TX_UDP_CSUM

If you see this message. Well done, DPVS is working with NIC dpdk0!

Don't worry if you see this error:

EAL: Error - exiting with code: 1
Cause: ports in DPDK RTE (2) != ports in dpvs.conf(1)

It means the number of NIC recognized by DPVS mismatched /etc/dpvs.conf. Please either modify NIC number in dpvs.conf or specify NICs with EAL option -a explicitly.

What config items does dpvs.conf support? How to configure them? Well, DPVS maintains a config item file conf/dpvs.conf.items which lists all supported config entries, default values, and feasible value ranges. Besides, some sample config files maintained in ./conf/dpvs.*.sample gives practical configurations of DPVS in corresponding circumstances.

Test Full-NAT (FNAT) Load Balancer

The test topology looks like the following diagram.

fnat-single-nic

Set VIP and Local IP (LIP, needed by FNAT mode) on DPVS. Let's put commands into setup.sh. You do some check by ./ipvsadm -ln, ./dpip addr show.

$ cat setup.sh
VIP=192.168.100.100
LIP=192.168.100.200
RS=192.168.100.2

./dpip addr add ${VIP}/24 dev dpdk0
./ipvsadm -A -t ${VIP}:80 -s rr
./ipvsadm -a -t ${VIP}:80 -r ${RS}:80 -b

./ipvsadm --add-laddr -z ${LIP} -t ${VIP}:80 -F dpdk0
$
$ ./setup.sh

Access VIP from Client, it looks good!

client $ curl 192.168.100.100
Your ip:port : 192.168.100.3:56890

Tutorial Docs

More examples can be found in the Tutorial Document. Including,

We also listed some frequently asked questions in the FAQ Document. It may help when you run into problems with DPVS.

Browse the doc directory for other documentations, including:

Performance Test

Our test shows the forwarding speed (PPS/packets per second) of DPVS is several times than LVS and as good as Google's Maglev.

performance

Click here for the lastest performance data.

License

Please refer to the License file for details.

Contributing

Please refer to the CONTRIBUTING file for details.

Community

Currently, DPVS has been widely accepted by dozens of community cooperators, who have successfully used and contributed a lot to DPVS. We just list some of them alphabetically as below.

CMSoft cmsoft
IQiYi iqiyi
NetEase netease
Shopee shopee
Xiaomi todo

Contact Us

DPVS is developed by iQiYi QLB team since April 2016. It's widely used in iQiYi IDC for L4 load balancer and SNAT clusters, and we have already replaced nearly all our LVS clusters with DPVS. We open-sourced DPVS at October 2017, and are excited to see that more people can get involved in this project. Welcome to try, report issues and submit pull requests. And please feel free to contact us through Github or Email.