Open anitsh opened 4 years ago
Introduction to Network Virtualization
Network virtualization is a key concept for both open source networking, as well as new cloud technologies. Virtualization technologies use network virtualization to allow communication between virtual machines or containers within a compute host, or across multiple compute hosts.
Network virtualization includes virtual networks that only exist within a host (such as Linux bridge, IO Visor, etc.), as well as technologies that allow communication between Linux bridges of multiple hosts (encapsulation and overlay networks).
Virtual networks that have been connecting the virtual machines within a host have existed for many years. Most of them use a host-based virtual bridge to connect the virtual interface of the virtual machines (a Linux bridge is a virtual Layer 2 switch). Virtual machines have been the main use case for Linux bridges for a long time. In recent years, with the introduction of containers, virtual switches started being used to connect containers, as well as virtual machines.
Container systems such as Docker use Linux bridge as their main method of connectivity.
In this chapter, we will explore network virtualization at a high level and understand why it is required.
Network Virtualization vs. Network Function Virtualization
Sometimes, Network Virtualization and Network Function Virtualization (NFV) may become confusing. It is important to understand that they are different concepts.
Network Virtualization is the concept we are discovering in this chapter and is related to creating virtual networks.
Network Function Virtualization is about using virtualization infrastructure to run network services such as firewall, IPS, load balancer, etc.
We will explore the Network Function Virtualization (NFV) in the next chapter.
Network Virtualization
Vendor-Specific Virtualization: VMware NSX
One of the most known commercial network virtualization solutions is VMware NSX. VMware NSX is a network and security virtualization platform mainly designed to work for VMware vSphere environments. VMware aims to create a platform for Software Defined Datacenter (SDDC) which includes managing and virtualizing compute, storage and networking. To fully virtualize and manage the networking element of SDDC, VMware created the NSX product, which is a platform for Network Virtualization, as well as Network Function Virtualization.
Name | VMware NSX By | VMware Where it runs | Over a virtualized environment, in VMware vSphere What it does | Creates and manages virtual networks, firewalls, load balancers, routers. Secures the East-West traffic by protecting VM-to-VM traffic at the host level. Provides security compliance and auditing. Provides a platform to implement microsegmentation using its ready-made firewall, load balancer, router and networking features. NSX has its own virtual firewall and virtual load balancers, which can be used for any workload. What you can do out-of-the-box | You can deploy VMware NSX in your existing VMware vSphere environment (version should be supported). After installation, you will get a new section in your VMware vCenter to manage your network and security. You can create virtual networks, provision firewalls and load balancers, create virtual routers and peers with external networks, create traffic filtering between VMs and many other features out-of-the-box.
VMware NSX enhances the VMware’s distributed virtual switches and adds many networking and security features. VMware NSX provides the following virtual functions:
Switching
Routing
Firewalling
Load balancing
VPN
Access control
Quality of Service management.
VMware NSX firewalling and host security functions can help organizations deploy distributed virtual firewalls across their infrastructure, removing the need to rely on large hardware appliances. Since VMware NSX has the ability to leverage and build overlay networks, you do not need to extend your virtual machine VLANs to your physical network, as VMware NSX performs the encapsulation using VXLAN and builds an overlay. This will help datacenters keep their physical network at Layer 3 routed, without the need to extend the tenants and virtual machines VLANs to the physical network.
Basic Protocol-Independent Encapsulation and Overlay
Overlay Networks
Overlay networks are virtualized networks used to connect virtual machines or containers that are located on different hosts on a Layer 3 network connection (Layer 3 fabric).
In a datacenter with multiple compute hosts, the cloud infrastructure will place the tenant virtual machines/containers on different hosts. This is done to achieve the high availability requirements. In most datacenters, there is no Layer 2 VLAN interconnected per tenant. Therefore, in order to allow Layer 2 connectivity between the VMs of a single tenant that are hosted on different compute node, the network will not be able to identify how to send the packets.
Underlay Network Unaware of IP Addresses and Virtual Networks Existing on Compute Hosts
As you can see in the above illustration, the underlay network is not aware of the IP addresses and virtual networks that exist on compute hosts. By default, there is no routing protocol running between compute hosts and network to allow compute hosts to advertise their virtual network and IP address ranges to the underlay network. Therefore, the underlay network will not be able to route the traffic generated from virtual workloads (virtual machines or containers).
There are multiple solutions provided by different entities to fix this problem, such as:
Using source NAT at host
All traffic generated by virtual workloads will get source NATed by the host. This method has been used in Docker.
Compute hosts participating in dynamic routing with the underlay network
This method runs a routing protocol stack on compute hosts (such as the FRR), allowing compute hosts to advertise their IP prefixes to the underlay network.
Creating virtual overlay networks using encapsulation protocols such as VXLAN, GRE, GENEVE
This method has been used in many cloud infrastructure platforms, such as OpenStack and VMware, as well as other open source networking projects, such as Tungsten Fabric.
Overlays are simple packet encapsulations, where the source host encapsulates the virtual machine packets into another packet (normally, UDP packets with VXLAN headers) and sends them to a specific host.
Host Encapsulates the Original Packet from the VM
The destination host receives the packet and decapsulates/extracts the inner packet from the outer packet. The destination host identifies the VXLAN header from the outer packet. The VXLAN header includes a VXLAN number which is called VNID (Virtual Network Identifier). The host will be able to route the packet based on the VNID tag to the correct Linux Bridge (or equivalent bridge) within the host. Once the host sends the packet to the related Linux Bridge (host virtual network), the packet will be received by the virtual workload (a virtual machine or container).
Overlay Network
OpenStack Networking Demo Video
Hardware Acceleration
Encapsulation and decapsulation of traffic add additional work on the host’s CPU to process each packet from and to virtual workloads. This process adds extra overhead to the system, reducing the host’s CPU time which should be used for running applications.
The good news is that, these days, most NIC cards support VXLAN offloading, which means the work of encapsulation and decapsulation can be offloaded to the NIC card chipset, instead of the CPU.
As we discovered in Chapter 2, when we talked about DPDK, NIC cards and SmartNICs, the process of traffic encapsulation and decapsulation can be hardware-accelerated. Most NIC cards have built-in capabilities to offload the VXLAN Encapsulation/Decapsulation works from the CPU, without requiring special configuration. If a NIC card doesn’t support native VXLAN offloading, you can use DPDK to build an application to accelerate the VXLAN encapsulation and decapsulation process.
Containers and Networking
Containerization is a method for running multiple isolated Linux systems (containers) on a control host using a single Linux kernel. Containerization started a few years ago, when new Linux kernel features such as cgroups and namespace isolation were introduced.
Containers are similar to virtual machines, but they don’t have a full operating system. A virtual machine includes a full operating system and all the binaries and applications to run a specific application(s). A container lacks a full operating system. Instead, it uses the host’s operating system and kernel.
In terms of network connectivity, both virtual machines and containers are similar. They both use a flavor of a virtual switch in a host (for example a Linux bridge or OVS) to connect the workloads to external networks.
The differences between virtual machines and containers are illustrated in the following table:
Virtual Machines | Containers | |
---|---|---|
Running Engine | Hypervisor | Container Engine |
Workload's OS/Kernel | Each VM has its own OS and kernel. | All containers use the host machine's OS and kernel. |
Density | Host machine needs to run multiple full operating systems. | Host machines run multiple containers using a common kernel and groups of common binaries and libraries. Less memory and CPU utilization than VMs. A host can run more containers compared to VMs. Higher density. |
Networking | Relies on host virtual switches or direct NIC connectivity using SR-IOV* (Single Root Input Output Virtualization). | Relies on host virtual switches or direct NIC connectivity using SR-IOV. |
Storage | Host presents a virtual block storage to the VM. | Container Engine manages the container storage. |
Guests | Can be any OS (such as Linux, BSD, Windows, etc.). | Only the same as the host's kernel. You can still run an Ubuntu container on a CentOS host. |
*A SR-IOV is a NIC card that allows a single physical NIC to present itself as multiple virtual NICs that a VM or a container can directly attach to. SR-IOV provides a higher performance for packet switching, as it connects directly from the physical NIC to the virtual workload.
Legacy vs SR-IOV Networking
Docker Networking for Containers
Docker creates a virtual network within the host and attaches the containers to it. This is a host private network and is not routable from outside of the host. Docker creates a virtual bridge called docker0 (by default), and allocates a private IP subnet (RFC1918) for that virtual bridge. For each container, Docker creates a virtual Ethernet device called veth and attaches it to the docker0 bridge (veth appears as eth0 in the container). The container IP address will be assigned to the veth interface from the private IP range, which was allocated by Docker for the docker0 virtual bridge. Containers will receive an IP address from Docker’s DHCP, as Docker also runs an IPAM and DHCP server to assign an IP address to containers.
By default, Docker performs Source NAT on traffic from containers to the outside world. Docker sets up a Layer2 virtual switch and creates a virtual connectivity within the host to allow the Source NAT function, allowing containers to gain access to the outside. Docker containers will be able to communicate with each other within the same host, as they are all connected to a common docker0 virtual bridge. However, containers on different hosts will not be able to communicate with each other, as the private IP subnet of docker0 bridge is not advertised or routed to the outside network beyond the host.
To allow communication between containers and the outside network or with containers sitting on other hosts, Docker uses a Destination NAT mechanism. Docker can allocate specific port numbers (TCP or UDP) on the host’s IP address, and then forward or proxy the inbound communication to the respective containers. It is very important to properly allocate the port (TCP/UDP) mapping between host and containers; otherwise; it will end up in a mess.
To allow communication from containers to the outside, the docker0 bridge performs basic Source NAT policy using the host’s IP address.
Docker Networking Function and NATing
Video
Connecting to Containers from the Outside Since one of the main use cases for containers is to run them as servers, we need to provide a connectivity method to containers from the outside. Source NAT does not answer this requirement. Docker uses the Destination NAT method to route traffic from the outside to a specific container via its docker0 virtual bridge. For example, you can have three Apache web server containers, all on different IP addresses, servicing different web applications. Using Destination NAT, you can allow communication from the outside user to the containers.
Destination NAT for Containers
Docker performs the Destination NAT on the virtual network (docker0) of the host. A request from outside towards the host IP address (192.168.1.20:8081) will get redirected to Container 1, the destination IP and port address gets changed to the container IP and port (10.10.0.11:80).
An example of a Destination NAT example for three web server containers:
Host IP Address | Host Listening Port | Destination NAT IP | Destination NAT Port |
---|---|---|---|
192.168.1.20 | 8081 | 10.10.0.11 | 80 |
192.168.1.20 | 8082 | 10.10.0.12 | 80 |
192.168.1.20 | 8083 | 10.10.0.13 | 80 |
There ore other solutions as well, as Linux networking is flexible. For example, we can use the host-based router FRR. You can change the default Docker networking behavior. You can disable NAT and instead run a host routing software such as FRR, and start advertising the docker0 IP subnets to the outside via OSPF, RIP, or BGP protocols.
Kubernetes
Kubernetes is a production-grade container orchestration platform for automating the deployment and management of containerized applications. Kubernetes was originally designed by Google, and now is hosted and maintained by Cloud Native Computing Foundation (part of The Linux Foundation).
Kubernetes has a master-slave architecture. A Kubernetes master server runs all the management and northbound APIs of Kubernetes, as well as maintains the configuration database, the state of workers, containers, etc. Kubernetes master (Control Plane) communicates with worker nodes, also known as Kubernetes nodes. Each Kubernetes node runs a set of Kubernetes agents, as well as some other core components, such as proxy/load balancer (kube-proxy).
Kubernetes works with different container engines, but the most used configuration is with Docker. Kubernetes supports several container engines, such as Docker, rkt, runC and any OCI runtime-specification implementation.
Kubernetes Architecture
Pods are groups of containers with shared storage/network and specification on how to run the containers. A pod contains one or more application containers, which are relatively tightly coupled. In a pre-container world, they would have executed on the same physical or virtual machine. A pod may contain a single container; for example, a single Apache server container. As another example, a basic web database application will have a pod with two containers.
Pod Name | Container | IP Address | Port |
---|---|---|---|
My App | 01-Apache | 10.0.0.5 | 80 |
My App | 02 | 10.0.0.5 | 3306 |
A Kubernetes service is a combination of pods that work together (for example, a multi-tier application).
Kubernetes Architecture The Kubernetes architecture has two main components:
Master (Control Plane server)
Nodes (the worker nodes running the containers).
Master Server Components:
kube-apiserver
Provides frontend and controllable APIs to control the Kubernetes environment via an orchestration or management platform.
etcd
Kubernetes data store.
kube-scheduler
Responsible for allocating worker nodes for newly created pods with no nodes assigned.
kube-controller-manager
Includes four controllers as:
- Node Controller: Responsible for finding out when a node goes down
- Replication Controller: Responsible for replication in the system
- Endpoints Controller: Populates the endpoints objects (i.e, joins services & pods)
- Service Account & Token Controllers: Create default accounts and API access tokens for new namespaces
cloud-controller-manager
Responsible to run controllers that interact with the underlying cloud providers, such as public cloud (AWS, Azure) or on-prem cloud (such as OpenStack).
Node Components:
kubelet
An agent that runs on each node, ensures that containers are running in a pod.
kube-proxy
Responsible for host traffic proxy and load balancing and rules on the host. It runs on each node and performs load balancing and NAT functions.
Container Runtime
The container engine software that is responsible for running containers such as Docker.
kube-proxy Load Balancing Traffic across Three nginx Pods
Kubernetes Networking
Kubernetes networking fundamentals are different from Docker by default. According to Kubernetes documentation,
"Kubernetes assumes that pods can communicate with other pods, regardless of which host they land on".
The key point for Kubernetes networking is that all containers and nodes should be able to communicate with each other without NAT. Overall, this model is less complex, as it requires a full IP reachability between all IP-based components within a Kubernetes cluster.
This means that, in practice, you cannot just take two servers running Docker and expect Kubernetes to work. You must ensure that the fundamental requirements of full IP reachability without NAT are met. To achieve this, Kubernetes relies on overlay and software defined networking platforms to create virtual networks or pure fabric programming.
Kubernetes is compatible with the following networking solutions to create networking clusters:
Cisco ACI
Big Switch Big Cloud Fabric
Tungsten Fabric (OpenContrail)
VMware NSX-T
OpenVSwitch (OVS)
Project Calico.
Apart from these, Kube-router and Flannel are also well-known. Flannel is an open source virtual networking for containers, and it has been successful with Kubernetes.
Video
MetalLB is a load-balancer implementation for bare metal Kubernetes clusters, using standard routing protocols.
Learning Objectives (Review)
You should now be able to:
Discuss about network function virtualization (NFV).
Analyze the benefits of network virtualization in cloud environments.
Explain the role of network virtualization in container environments.
Summary
In this chapter, we learned about network virtualization, expanding on the knowledge about network virtualization from Chapter 6, when we explored projects such as Calico and Tungsten Fabric. Network virtualization covers both virtual bridges and networks within a host (such as Linux bridge), as well as building overlay networks with VXLAN encapsulation across a Layer 3 fabric.
The benefit of overlay network virtualization is that it does rely on existing physical and underlay networks. It does not require any changes done on the fabric; having a Clos design is the best practice when building a datacenter network.
We also learned that Docker containers need to use network virtualization, as the hosts do not expose the container networks or the docker0 bridge to the outside. However, in a Kubernetes environment, a full IP reachability with no NAT is required. Kubernetes relies on external solutions to provide a fully reachable network where pods can access each other on different nodes, as well as the outside world.
Learning Objectives
By the end of this chapter, you should be able to: