intel / ethernet-linux-iavf

GNU General Public License v2.0
1 stars 3 forks source link

iavf Linux* Driver for Intel(R) Ethernet Adaptive Virtual Function


August 13, 2024

Contents ^^^^^^^^

Overview

The iavf virtual function (VF) driver supports virtual functions generated by the physical function (PF) driver, with one or more VFs enabled through sysfs.

The associated PF drivers for this VF driver are:

The MTU size set on a VF should match the MTU size set on the PF. A mismatch in MTU sizes may cause unexpected results.

SR-IOV requires the correct platform and OS support.

The guest OS loading this driver must support MSI-X interrupts.

For questions related to hardware requirements, refer to the documentation supplied with your Intel Ethernet adapter. All hardware requirements listed apply to use with Linux*.

Driver information can be obtained using ethtool, lspci, and ip.

Adaptive Virtual Function

Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to adapt to changing feature sets of the physical function driver (PF) with which it is associated. This allows system administrators to update a PF without having to update all the VFs associated with it. All AVFs have a single common device ID and branding string.

AVFs have a minimum set of features known as "base mode," but may provide additional features depending on what features are available in the PF with which the AVF is associated. The following are base mode features:

Related Documentation

See the "Intel(R) Ethernet Adapters and Devices User Guide" for additional information on . It is available on the Intel website at https://cdrdv2.intel.com/v1/dl/getContent/705831/

Identifying Your Adapter

This driver is compatible with virtual functions bound to devices based on the following:

For information on how to identify your adapter, and for the latest Intel network drivers, refer to the Intel Support website at https://www.intel.com/support.

Building and Installation

To Manually Build the Driver

  1. Move the virtual function driver tar file to the directory of your choice. For example, use "/home/username/iavf" or "/usr/local/src/iavf".

  2. Untar/unzip the archive, where "<x.x.x>" is the version number for the driver tar file:

    tar zxf iavf-<x.x.x>.tar.gz

  3. Change to the driver src directory, where "<x.x.x>" is the version number for the driver tar:

    cd iavf-<x.x.x>/src/

  4. Compile the driver module:

    make install

    The binary will be installed as:

    /lib/modules//updates/drivers/net/ethernet/intel/iavf/iavf.ko

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using the modprobe command. To check the version of the driver and then load it:

    modinfo iavf modprobe iavf

    Alternately, make sure that any older iavf drivers are removed from the kernel before loading the new module:

    rmmod iavf; modprobe iavf

  6. Assign an IP address to the interface by entering the following, where "ethX" is the interface name that was shown in dmesg after modprobe:

    ip address add / dev

  7. Verify that the interface works. Enter the following, where "IP_address" is the IP address for another machine on the same subnet as the interface that is being tested:

    ping

To Build a Binary RPM Package of This Driver

Note:

RPM functionality has only been tested in Red Hat distributions.

  1. Run the following command, where "<x.x.x>" is the version number for the driver tar file:

    rpmbuild -tb iavf-<x.x.x>.tar.gz

    Note:

    For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel, reboot the system before building.

  2. After building the RPM, the last few lines of the tool output contain the location of the RPM file that was built. Install the RPM with one of the following commands, where "" is the location of the RPM file:

    rpm -Uvh

    or:

    dnf/yum localinstall

  3. If your distribution or kernel does not contain inbox support for auxiliary bus, you must also install the auxiliary RPM:

    rpm -Uvh

    or:

    dnf/yum localinstall

Note:

On some distributions, the auxiliary RPM may fail to install due to missing kernel-devel headers. To workaround this issue, specify "-- excludepath" during installation. For example:

 rpm -Uvh auxiliary-1.0.0-1.x86_64.rpm --excludepath=/lib/modules/3.10.0-957.el7.x86_64/source/include/linux/auxiliary_bus.h

Note:

Command Line Parameters

The iavf driver does not support any command line parameters.

Additional Features and Configurations

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8

Note:

This setting is not saved across reboots.

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. The latest ethtool version is required for this functionality. Download it at https://kernel.org/pub/software/network/ethtool/.

Setting VLAN Tag Stripping

If you have applications that require Virtual Functions (VFs) to receive packets with VLAN tags, you can disable VLAN tag stripping for the VF. The Physical Function (PF) processes requests issued from the VF to enable or disable VLAN tag stripping.

Note:

If the PF has assigned a VLAN to a VF, then requests from that VF to set VLAN tag stripping will be ignored.

To enable/disable VLAN tag stripping for a VF, issue the following command from inside the VM in which you are running the VF:

ethtool -K rxvlan on/off

or, alternatively:

ethtool --offload rxvlan on/off

IEEE 802.1ad (QinQ) Support

The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as "tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks allow L2 tunneling and the ability to separate traffic within a particular VLAN ID, among other uses.

The following are examples of how to configure 802.1ad (QinQ):

ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371

Where "24" and "371" are example VLAN IDs.

Note:

Double VLANs

Devices based on the Intel(R) Ethernet 800 Series can process up to two VLANs in a packet when all the following are installed:

If you don't use the versions above, the only supported VLAN configuration is single 802.1Q VLAN traffic.

When two VLAN tags are present in a packet, the outer VLAN tag can be either 802.1Q or 802.1ad. The inner VLAN tag must always be 802.1Q.

Note:

One limitation is that, for each VF, the PF can only allow VLAN hardware offloads (insertion and stripping) of one type, either 802.1Q or 802.1ad.

To enable outer or single 802.1Q VLAN insertion and stripping and disable 802.1ad VLAN insertion and stripping:

ethtool -K rxvlan on txvlan on rx-vlan-stag-hw-parse off tx-vlan-stag-hw-insert off

To enable outer or single 802.1ad VLAN insertion and stripping and disable 802.1Q VLAN insertion and stripping:

ethtool -K rxvlan off txvlan off rx-vlan-stag-hw-parse on tx-vlan-stag-hw-insert on

To enable outer or single VLAN filtering if the VF supports modifying VLAN filtering:

ethtool -K rx-vlan-filter on rx-vlan-stag-filter on

To disable outer or single VLAN filtering if the VF supports modifying VLAN filtering:

ethtool -K rx-vlan-filter off rx-vlan-stag-filter off

Combining QinQ with SR-IOV VFs

We recommend you always configure a port VLAN for the VF from the PF. If a port VLAN is not configured, the VF driver may only offload VLANs via software. The PF allows all VLAN traffic to reach the VF and the VF manages all VLAN traffic.

When the device is configured for double VLANs and the PF has configured a port VLAN:

However, when the device is configured for double VLANs and the PF has NOT configured a port VLAN:

If the PF does not support double VLANs, the VF can hardware offload single 802.1Q VLANs without a port VLAN.

When the PF is enabled for double VLANs, for iavf drivers before version 4.1.x:

To see VLAN filtering and offload capabilities, use the following command:

ethtool -k | grep vlan

Application Device Queues (ADQ)

Application Device Queues (ADQ) allow you to dedicate one or more queues to a specific application. This can reduce latency for the specified application, and allow Tx traffic to be rate limited per application.

The ADQ information contained here is specific to the iavf driver. For more details, refer to the E810 ADQ Configuration Guide at: https://cdrdv2.intel.com/v1/dl/getContent/609008.

Requirements:

When ADQ is enabled:

See Creating Traffic Class Filters in this README for more information on configuring filters, including examples. See the E810 ADQ Configuration Guide for detailed instructions.

Creating Traffic Classes

Note:

These instructions are not specific to ADQ configuration. Refer to the tc and tc-flower man pages for more information on creating traffic classes (TCs).

To create traffic classes on the interface:

  1. Use the tc command to create traffic classes. You can create a maximum of 16 TCs from the VM on Intel(R) Ethernet 800 Series devices and 4 TCs from the VM on Intel(R) Ethernet 700 Series devices:

    tc qdisc add dev root mqprio num_tc map queues <count1@offset1 ...> hw 1 mode channel shaper bw_rlimit min_rate <min_rate1 ...> max_rate <max_rate1 ...>

    Where:

    num_tc : The number of TCs to use.

    map : The map of priorities to TCs. You can map up to 16 priorities to TCs.

    queues <count1@offset1 ...>: For each TC, "\@". The max total number of queues for all TCs is the number of cores.

    hw 1 mode channel: "channel" with "hw" set to 1 is a new hardware offload mode in mqprio that makes full use of the mqprio options, the TCs, the queue configurations, and the QoS parameters.

    shaper bw_rlimit: For each TC, sets the minimum and maximum bandwidth rates. The totals must be equal to or less than the port speed. This parameter is optional and is required only to set up the Tx rates.

    min_rate : Sets the minimum bandwidth rate limit for each TC.

    max_rate <max_rate1 ...>: Sets the maximum bandwidth rate limit for each TC. You can set a min and max rate together.

    Note:

    • If you set "max_rate" to less than 50Mbps, then "max_rate" is rounded up to 50Mbps and a warning is logged in dmesg.

    • See the mqprio man page and the examples below for more information.

  2. Verify the bandwidth limit using network monitoring tools such as "ifstat" or "sar -n DEV [interval] [number of samples]".

    Note:

    Setting up channels via ethtool ("ethtool -L") is not supported when the TCs are configured using mqprio.

  3. Enable hardware TC offload on the interface:

    ethtool -K hw-tc-offload on

  4. Add clsact qdisc to enable adding ingress/egress filters for Rx/Tx:

    tc qdisc add dev clsact

  5. Verify successful TC creation after qdisc is created:

    tc qdisc show dev ingress

Traffic Class Examples


See the tc and tc-flower man pages for more information on traffic
control and TC flower filters.

To set up two TCs (tc0 and tc1), with 16 queues each, priorities 0-3
for tc0 and 4-7 for tc1, and max Tx rate set to 1Gbit for tc0 and
3Gbit for tc1:

   tc qdisc add dev ens4f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 queues
   16@0 16@16 hw 1 mode channel shaper bw_rlimit max_rate 1Gbit 3Gbit

Where:

map 0 0 0 0 1 1 1 1:
   Sets priorities 0-3 to use tc0 and 4-7 to use tc1.

queues 16@0 16@16:
   Assigns 16 queues to tc0 at offset 0 and 16 queues to tc1 at offset
   16.

Creating Traffic Class Filters
------------------------------

Note:

  These instructions are not specific to ADQ configuration.

After creating traffic classes, use the tc command to create filters
for traffic. Refer to the tc and tc-flower man pages for more
information.

To view all TC filters:

   tc filter show dev <ethX> ingress
   tc filter show dev <ethX> egress

TC Filter Examples

To configure TCP TC filters, where:

protocol: Encapsulation protocol (valid options are IP and 802.1Q).

prio: Priority.

flower: Flow-based traffic control filter.

dst_ip: IP address of the device.

ip_proto: IP protocol to use (TCP or UDP).

dst_port: Destination port.

src_port: Source port.

skip_sw: Flag to add the rule only in hardware.

hw_tc: Route incoming traffic flow to this hardware TC. The TC count starts at 0. For example, "hw_tc 1" indicates that the filter is on the second TC.

vlan_id: VLAN ID.

Note:

You can add multiple filters to the device, using the same recipe (and requires no additional recipe resources), either on the same interface or on different interfaces. Each filter uses the same fields for matching, but can have different match values.

 tc filter add dev <ethX> protocol ip ingress prio 1 flower ip_proto
 tcp dst_port <port_number> skip_sw hw_tc 1

 tc filter add dev <ethX> protocol ip egress prio 1 flower ip_proto tcp
 src_port <port_number> action skbedit priority 1

For example:

 tc filter add dev ens4f0 protocol ip ingress prio 1 flower ip_proto
 tcp dst_port 5555 skip_sw hw_tc 1

 tc filter add dev ens4f0 protocol ip egress prio 1 flower ip_proto
 tcp src_port 5555 action skbedit priority 1

RDMA in the VF

Devices based on the Intel(R) Ethernet 800 Series support RDMA in a Linux VF, on supported Windows or Linux hosts.

The iavf driver supports the following RDMA protocols in the VF:

Note:

RDMA in the VF is not supported on Intel(R) Ethernet X722 Series devices.

Refer to the README inside the irdma driver tarball for details on configuring RDMA in the VF.

Note:

To support VF RDMA, load the irdma driver on the host before creating VFs. Otherwise VF RDMA support may not be negotiated between the VF and PF driver.

The iavf driver allocates MSI-X resources for the VF RDMA instance (irdma). The LAN iavf driver gets first priority and any leftover MSI-X interrupts are used for VF RDMA.

Auxiliary Bus

Inter-Driver Communication (IDC) is the mechanism in which LAN drivers (such as iavf) communicate with peer drivers (such as irdma). Starting in kernel 5.11, Intel LAN and RDMA drivers use an auxiliary bus mechanism for IDC.

RDMA functionality requires use of the auxiliary bus.

If your kernel supports the auxiliary bus, the LAN and RDMA drivers will use the inbox auxiliary bus for IDC. For kernels lower than 5.11, the base driver will automatically install an out-of-tree auxiliary bus module.

Performance Optimization

Driver defaults are meant to fit a wide variety of workloads, but if further optimization is required, we recommend experimenting with the settings in this section.

Rx Descriptor Ring Size

To reduce the number of Rx packet discards, increase the number of Rx descriptors for each Rx ring using ethtool.

Note:

When you are handling a large number of connections in a VF, we recommend setting the number of Rx descriptors to 1024 or above. For example:

 ethtool -G <ethX> rx 2048

Known Issues/Troubleshooting

Software Issues

If your Intel Ethernet Network Connection is not working after installing the driver, verify that you have installed the correct driver.

Linux bonding failures with VFs

If you bind Virtual Functions (VFs) to an Intel(R) Ethernet 700 Series device, the VF targets may fail when they become the active target. If the MAC address of the VF is set by the PF (Physical Function) of the device, when you add a target, or change the active-backup target, Linux bonding tries to sync the backup target's MAC address to the same MAC address as the active target. Linux bonding will fail at this point. This issue will not occur if the VF's MAC address is not set by the PF.

When using bonding mode 5 (i.e., balance-tlb or adaptive transmit load balancing), if you add multiple VFs to the bond, they are assigned duplicate MAC addresses. When the VFs are joined with the bond interface, the Linux bonding driver sets the MAC address for the VFs to the same value. The MAC address is based on the first active VF added to that bond.

This results in balance-tlb mode not functioning as expected. PF interfaces behave as expected. The presence of duplicate MAC addresses may cause further issues, depending on your switch configuration.

Traffic is not being passed between VM and client

You may not be able to pass traffic between a client system and a Virtual Machine (VM) running on a separate host if the Virtual Function (VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled on the VF.

Note:

This situation can occur in any combination of client, host, and guest operating system. See the readme for the PF driver for information on spoof checking and how to set the VF to trusted mode.

Using four traffic classes fails

Do not try to reserve more than three traffic classes in the iavf driver. Doing so will fail to set any traffic classes and will cause the driver to write errors to stdout. Use a maximum of three queues to avoid this issue.

Unexpected errors in dmesg when adding TCP filters on the VF

When ADQ is configured and the VF is not in trusted mode, you may see unexpected error messages in dmesg on the host when you try to add TCP filters on the VF. This is due to the asynchronous design of the iavf driver. The VF does not know whether it is trusted and appears to set the filter, while the PF blocks the request and reports an error. See the dmesg log in the host OS for details about the error.

Multiple log error messages on iavf driver removal

If you have several VFs and you remove the iavf driver, several instances of the following log errors are written to the log:

Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok Unable to send the message to VF 2 aq_err 12 ARQ Overflow Error detected

MAC address of Virtual Function changes unexpectedly

If a Virtual Function's MAC address is not assigned in the host, then the VF driver will use a random MAC address. This random MAC address may change each time the VF driver is reloaded. You can assign a static MAC address in the host machine. This static MAC address will survive a VF driver reload.

Driver buffer overflow fix

The fix to resolve CVE-2016-8105, referenced in Intel SA-00069 https://www.intel.com/content/www/us/en/security-center/advisory /intel-sa-00069.html, is included in this and future versions of the driver.

Compiling the driver

When trying to compile the driver by running make install, the following error may occur:

Linux kernel source not configured - missing version.h

To solve this issue, create the "version.h" file by going to the Linux source tree and entering:

make include/linux/version.h

Multiple interfaces on same Ethernet broadcast network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, turn on ARP filtering by entering the following:

echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

This only works if your kernel's version is higher than 2.4.5.

Note:

This setting is not saved across reboots. The configuration change can be made permanent by adding the following line to the file "/etc/sysctl.conf":

 net.ipv4.conf.all.arp_filter = 1

Alternatively, you can install the interfaces in separate broadcast domains (either in different switches or in a switch partitioned to VLANs).

Rx page allocation errors

Errors that read, "Page allocation failure. order:0," may occur under stress with kernels 2.6.25 and newer. This is caused by the way the Linux kernel reports this stressed condition.

Host may reboot after removing PF when VF is active in guest

Kernel versions earlier than 3.2 do not unload the PF driver with active VFs. Doing this will cause your VFs to stop working until you reload the PF driver. It can also cause a spontaneous reboot of your system.

Prior to unloading the PF driver, you must first ensure that all VFs are no longer active. Do this by shutting down all VMs and unloading the VF driver.

Older VF drivers on Intel Ethernet 800 Series adapters

Some Windows* VF drivers from Release 22.9 or older may encounter errors when loaded on a PF, based on the Intel Ethernet 800 Series on Linux KVM. You may see errors and the VF may not load. This issue does not occur, starting with the following Windows VF drivers:

To resolve this issue, download and install the latest iavf driver.

SR-IOV virtual functions have identical MAC addresses

When you create multiple SR-IOV virtual functions, the VFs may have identical MAC addresses. Only one VF will pass traffic, and all traffic on other VFs with identical MAC addresses will fail. This is related to the "MACAddressPolicy=persistent" setting in "/usr/lib/systemd/network/99-default.link".

To resolve this issue, edit the "/usr/lib/systemd/network/99-default.link" file and change the "MACAddressPolicy" line to "MACAddressPolicy=none". For more information, see the systemd.link man page.

Support

For general information, go to the Intel support website at https://www.intel.com/support/

or the Intel Ethernet Linux project hosted by GitHub at https://github.com/intel/ethernet-linux-iavf

If an issue is identified with the released source code on a supported kernel with a supported adapter, contact Intel Customer Support at https://www.intel.com/content/www/us/en/support/products/36773 /ethernet-products.html

License

This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, version 2, as published by the Free Software Foundation.

This program is distributed in the hope it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.

The full GNU General Public License is included in this distribution in the file called "COPYING".

Copyright (c) 2018 - 2024, Intel Corporation.

Trademarks

Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and/or other countries.

Other names and brands may be claimed as the property of others.