ice Linux* Base Driver for the Intel(R) Ethernet 800 Series
August 14, 2024
Contents
^^^^^^^^
ice Linux* Base Driver for the Intel(R) Ethernet 800 Series
Overview
Related Documentation
Identifying Your Adapter
Important Notes
Building and Installation
Command Line Parameters
Additional Features and Configurations
Performance Optimization
Known Issues/Troubleshooting
Support
License
Trademarks
Overview
This driver supports Linux* kernel versions 3.10.0 and newer. However,
some features may require a newer kernel version. The associated
Virtual Function (VF) driver for this driver is iavf. The associated
RDMA driver for this driver is irdma.
Driver information can be obtained using ethtool, devlink, lspci, and
ip. Instructions on updating ethtool can be found in the section
Additional Configurations later in this document.
This driver is only supported as a loadable module at this time. Intel
is not supplying patches against the kernel source to allow for static
linking of the drivers.
For questions related to hardware requirements, refer to the
documentation supplied with your Intel adapter. All hardware
requirements listed apply to use with Linux.
This driver supports XDP (Express Data Path) on kernel 4.14 and later
and AF_XDP zero-copy on kernel 4.18 and later. Note that XDP is
blocked for frame sizes larger than 3KB.
The driver is compatible with devices based on the following:
Intel(R) Ethernet Controller E810-C
Intel(R) Ethernet Controller E810-XXV
Intel(R) Ethernet Connection E822-C
Intel(R) Ethernet Connection E822-L
Intel(R) Ethernet Connection E823-C
Intel(R) Ethernet Connection E823-L
For information on how to identify your adapter, and for the latest
Intel network drivers, refer to the Intel Support website at
https://www.intel.com/support.
Important Notes
Configuring SR-IOV for improved network security
In a virtualized environment, on Intel(R) Ethernet Network Adapters
that support SR-IOV or Intel(R) Scalable I/O Virtualization (Intel(R)
Scalable IOV), the virtual function (VF) may be subject to malicious
behavior. Software-generated layer two frames, like IEEE 802.3x (link
flow control), IEEE 802.1Qbb (priority based flow-control), and others
of this type, are not expected and can throttle traffic between the
host and the virtual switch, reducing performance. To resolve this
issue, and to ensure isolation from unintended traffic streams,
configure all SR-IOV or Intel Scalable IOV enabled ports for VLAN
tagging from the administrative interface on the PF. This
configuration allows unexpected, and potentially malicious, frames to
be dropped.
See the following sections later in this README for configuration
instructions:
Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports
Intel(R) Scalable I/O Virtualization Support
Do not unload port driver if VF with active VM is bound to it
Do not unload a port's driver if a Virtual Function (VF) with an
active Virtual Machine (VM) is bound to it. Doing so will cause the
port to appear to hang. Once the VM shuts down, or otherwise releases
the VF, the command will complete.
Firmware Recovery Mode
A device will enter Firmware Recovery mode if it detects a problem
that requires the firmware to be reprogrammed. When a device is in
Firmware Recovery mode it will not pass traffic or allow any
configuration; you can only attempt to recover the device's firmware.
Refer to the "Intel(R) Ethernet Adapters and Devices User Guide" for
details on Firmware Recovery Mode and how to recover from it.
Important Notes for SR-IOV, RDMA, and Link Aggregation
The VF driver will not block teaming/bonding/link aggregation, but
this is not a supported feature. Do not expect failover or load
balancing on the VF interface.
LAG and RDMA are compatible only in certain conditions. See the RDMA
(Remote Direct Memory Access) section later in this README for more
information.
Bridging and MACVLAN are also affected by this. If you wish to use
bridging or MACVLAN with RDMA/SR-IOV, you must set up bridging or
MACVLAN before enabling RDMA or SR-IOV. If you are using bridging or
MACVLAN in conjunction with SR-IOV and/or RDMA, and you want to remove
the interface from the bridge or MACVLAN, you must follow these steps:
Remove RDMA if it is active
Destroy SR-IOV VFs if they exist
Remove the interface from the bridge or MACVLAN
Reactivate RDMA and recreate SR-IOV VFs as needed
Building and Installation
The ice driver requires the Dynamic Device Personalization (DDP)
package file to enable advanced features (such as dynamic tunneling,
Intel(R) Ethernet Flow Director, RSS, and ADQ, or others). The driver
installation process installs the default DDP package file and creates
a soft link "ice.pkg" to the physical package "ice-x.x.x.x.pkg" in the
firmware root directory (typically "/lib/firmware/" or
"/lib/firmware/updates/"). The driver install process also puts both
the driver module and the DDP file in the "initramfs/initrd" image.
Note:
When the driver loads, it looks for "intel/ice/ddp/ice.pkg" in the
firmware root. If this file exists, the driver will download it into
the device. If not, the driver will go into Safe Mode where it will
use the configuration contained in the device's NVM. This is NOT a
supported configuration and many advanced features will not be
functional. See Dynamic Device Personalization later for more
information.
To manually build the driver
Move the base driver tar file to the directory of your choice. For
example, use "/home/username/ice" or "/usr/local/src/ice".
Untar/unzip the archive, where "<x.x.x>" is the version number for
the driver tar file:
tar zxf ice-<x.x.x>.tar.gz
Change to the driver src directory, where "<x.x.x>" is the version
number for the driver tar:
The install location listed above is the default location. This may
differ for various Linux distributions.
Note:
To build the driver using the schema for unified ethtool
statistics, use the following command:
make CFLAGS_EXTRA='-DUNIFIED_STATS' install
Note:
To compile the driver with ADQ (Application Device Queues) flags
set, use the following command, where "" is the number of
logical cores:
make -j<nproc> CFLAGS_EXTRA='-DADQ_PERF_COUNTERS' install
(This will also apply the above "make install" command.)
Note:
You may see warnings from depmod related to unknown RDMA symbols
during the make of the out-of-tree base driver. These warnings
are normal and appear because the in-tree RDMA driver will not
work with the out-of-tree base driver. To address the issue, you
need to install the latest out-of-tree versions of the base and
RDMA drivers.
Note:
Some Linux distributions require you to manually regenerate
initramfs/initrd after installing the driver to allow the driver
to properly load with the firmware at boot time. Please refer to
the distribution documentation for instructions.
Load the module using the modprobe command.
To check the version of the driver and then load it:
modinfo ice
modprobe ice
Alternately, make sure that any older ice drivers are removed from
the kernel before loading the new module:
rmmod ice; modprobe ice
Note:
To enable verbose debug messages in the kernel log, use the
dynamic debug feature (dyndbg). See Dynamic Debug later in this
README for more information.
Assign an IP address to the interface by entering the following,
where "" is the interface name that was shown in dmesg after
modprobe:
ip address add / dev
Verify that the interface works. Enter the following, where
"IP_address" is the IP address for another machine on the same
subnet as the interface that is being tested:
ping
To build a binary RPM package of this driver
Note:
RPM functionality has only been tested in Red Hat distributions.
Run the following command, where "<x.x.x>" is the version number
for the driver tar file:
rpmbuild -tb ice-<x.x.x>.tar.gz
Note:
For the build to work properly, the currently running kernel MUST
match the version and configuration of the installed kernel
sources. If you have just recompiled the kernel, reboot the
system before building.
After building the RPM, the last few lines of the tool output
contain the location of the RPM file that was built. Install the
RPM with one of the following commands, where "" is the
location of the RPM file:
rpm -Uvh
or:
dnf/yum localinstall
If your distribution or kernel does not contain inbox support for
auxiliary bus, you must also install the auxiliary RPM:
rpm -Uvh
or:
dnf/yum localinstall
Note:
On some distributions, the auxiliary RPM may fail to install due
to missing kernel-devel headers. To workaround this issue,
specify "--excludepath" during installation. For example:
To compile the driver on some kernel/arch combinations, you may
need to install a package with the development version of libelf
(e.g. libelf-dev, libelf-devel, elfutils-libelf-devel).
When compiling an out-of-tree driver, details will vary by
distribution. However, you will usually need a kernel-devel RPM or
some RPM that provides the kernel headers at a minimum. The RPM
kernel-devel will usually fill in the link at "/lib/modules/'uname
-r'/build".
Command Line Parameters
The only command line parameter the ice driver supports is the debug
parameter that can control the default logging verbosity of the
driver. (Note: dyndbg also provides dynamic debug information.)
In general, use ethtool and other OS-specific commands to configure
user-changeable parameters after the driver is loaded.
Additional Features and Configurations
ethtool
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest
ethtool version is required for this functionality. Download it at
https://kernel.org/pub/software/network/ethtool/.
Viewing Link Messages
Link messages will not be displayed to the console if the distribution
is restricting system messages. In order to see network driver link
messages on your console, set dmesg to eight by entering the
following:
dmesg -n 8
Note:
This setting is not saved across reboots.
Dynamic Device Personalization
Dynamic Device Personalization (DDP) allows you to change the packet
processing pipeline of a device by applying a profile package to the
device at runtime. Profiles can be used to, for example, add support
for new protocols, change existing protocols, or change default
settings. DDP profiles can also be rolled back without rebooting the
system.
The ice driver automatically installs the default DDP package file
during driver installation.
Note:
It's important to do "make install" during initial ice driver
installation so that the driver loads the DDP package automatically.
The DDP package loads during device initialization. The driver looks
for "intel/ice/ddp/ice.pkg" in your firmware root (typically
"/lib/firmware/" or "/lib/firmware/updates/") and checks that it
contains a valid DDP package file.
If the driver is unable to load the DDP package, the device will enter
Safe Mode. Safe Mode disables advanced and performance features and
supports only basic traffic and minimal functionality, such as
updating the NVM or downloading a new driver or DDP package. Safe Mode
only applies to the affected physical function and does not impact any
other PFs. See the "Intel(R) Ethernet Adapters and Devices User Guide"
for more details on DDP and Safe Mode.
Note:
If you encounter issues with the DDP package file, you may need to
download an updated driver or DDP package file. See the log
messages for more information.
The "ice.pkg" file is a symbolic link to the default DDP package
file installed by the Linux-firmware software package or the ice
out-of-tree driver installation.
You cannot update the DDP package if any PF drivers are already
loaded. To overwrite a package, unload all PFs and then reload the
driver with the new package.
Only the first loaded PF per device can download a package for
that device.
You can install specific DDP package files for different physical
devices in the same system. To install a specific DDP package file:
Download the DDP package file you want for your device.
Rename the file "ice-xxxxxxxxxxxxxxxx.pkg", where
"xxxxxxxxxxxxxxxx" is the unique 64-bit PCI Express device serial
number (in hex) of the device you want the package downloaded on.
The file name must include the complete serial number (including
leading zeros) and be all lowercase. For example, if the 64-bit
serial number is b887a3ffffca0568, then the file name would be
"ice-b887a3ffffca0568.pkg".
To find the serial number from the PCI bus address, you can use the
following command:
lspci -vv -s af:00.0 | grep -i Serial
Capabilities: [150 v1] Device Serial Number b8-87-a3-ff-ff-ca-05-68
You can use the following command to format the serial number
without the dashes:
lspci -vv -s af:00.0 | grep -i Serial | awk '{print $7}' | sed s/-//g b887a3ffffca0568
Copy the renamed DDP package file to
"/lib/firmware/updates/intel/ice/ddp/". If the directory does not
yet exist, create it before copying the file.
Unload all of the PFs on the device.
Reload the driver with the new package.
Note:
The presence of a device-specific DDP package file overrides the
loading of the default DDP package file ("ice.pkg").
RDMA (Remote Direct Memory Access)
Remote Direct Memory Access, or RDMA, allows a network device to
transfer data directly to and from application memory on another
system, increasing throughput and lowering latency in certain
networking environments.
The ice driver supports the following RDMA protocols:
iWARP (Internet Wide Area RDMA Protocol)
RoCEv2 (RDMA over Converged Ethernet)
The major difference is that iWARP performs RDMA over TCP, while
RoCEv2 uses UDP.
RDMA requires auxiliary bus support. Refer to Auxiliary Bus in this
README for more information.
Devices based on the Intel(R) Ethernet 800 Series do not support RDMA
when operating in multiport mode with more than 4 ports.
For detailed installation and configuration information for RDMA, see
the README file in the irdma driver tarball.
RDMA in the VF
Devices based on the Intel(R) Ethernet 800 Series support RDMA in a
Linux VF, on supported Windows or Linux hosts.
The iavf driver supports the following RDMA protocols in the VF:
iWARP (Internet Wide Area RDMA Protocol)
RoCEv2 (RDMA over Converged Ethernet)
Refer to the README inside the irdma driver tarball for details on
configuring RDMA in the VF.
Note:
To support VF RDMA, load the irdma driver on the host before
creating VFs. Otherwise VF RDMA support may not be negotiated
between the VF and PF driver.
Auxiliary Bus
Inter-Driver Communication (IDC) is the mechanism in which LAN drivers
(such as ice) communicate with peer drivers (such as irdma). Starting
in kernel 5.11, Intel LAN and RDMA drivers use an auxiliary bus
mechanism for IDC.
RDMA functionality requires use of the auxiliary bus.
If your kernel supports the auxiliary bus, the LAN and RDMA drivers
will use the inbox auxiliary bus for IDC. For kernels lower than 5.11,
the base driver will automatically install an out-of-tree auxiliary
bus module.
NVM Express* (NVMe) over TCP and Fabrics
RDMA provides a high throughput, low latency means to directly access
NVM Express (NVMe) drives on a remote server.
Refer to the following configuration guides for details on supported
operating systems and how to set up and configure your server and
client systems:
NVM Express over TCP for Intel(R) Ethernet Products Configuration
Guide
NVM Express over Fabrics for Intel(R) Ethernet Products with RDMA
Configuration Guide
Link aggregation (LAG) and RDMA are compatible only if all the
following are true:
You are using an Intel Ethernet 810 Series device with the latest
drivers and NVM installed.
RDMA technology is set to RoCEv2.
LAG configuration is either active-backup or active-active.
Bonding is between two ports within the same device.
The QoS configuration of the two ports matches prior to the bonding
of the devices.
If the above conditions are not met:
The PF driver will not enable RDMA.
RDMA peers will not be able to register with the PF.
Note:
The first interface added to an aggregate (bond) is assigned as the
"primary" interface for RDMA and LAG functionality. If LAN
interfaces are assigned to the bond and you remove the primary
interface from the bond, RDMA will not function properly over the
bonded interface. To address the issue, remove all interfaces from
the bond and add them again. Interfaces that are not assigned to the
bond will operate normally.
If the ice driver is configured for active-backup or active-active
LAG:
The ice driver will block any DCB/hardware QoS configuration changes
on the bonded ports.
Only the primary port is available for the RDMA driver.
The ice driver will forward RoCEv2 traffic from the secondary port
to the primary port by creating an appropriate switch rule.
If the ice driver is configured for active-active LAG:
The ice driver will allow the RDMA driver to configure QSets for
both active ports.
A port failure on the active port will trigger a failover mechanism
and move the queue pairs to the currently active port. Once the port
has recovered, the RDMA driver will move RDMA QSets back to the
originally allocated port.
Application Device Queues (ADQ)
Application Device Queues (ADQ) allow you to dedicate one or more
queues to a specific application. This can reduce latency for the
specified application, and allow Tx traffic to be rate limited per
application.
Kernel version: Varies by feature. Refer to the E810 ADQ
Configuration Guide for more information on required kernel versions
for different ADQ features.
Operating system: Red Hat Enterprise Linux 7.5+ or SUSE Linux
Enterprise Server 12+
The latest ice driver and NVM image (Note: You must compile the ice
driver with the ADQ flag as shown in the Building and Installation
section.)
The "sch_mqprio", "act_mirred", and "cls_flower" modules must be
loaded. For example:
cd iproute2
./configure
make DESTDIR=/opt/iproute2 install
ln -s /opt/iproute2/sbin/tc /usr/local/sbin/tc
When ADQ is enabled:
You cannot change RSS parameters, the number of queues, or the MAC
address in the PF or VF. Delete the ADQ configuration before
changing these settings.
The driver supports subnet masks for IP addresses in the PF and VF.
When you add a subnet mask filter, the driver forwards packets to
the ADQ VSI instead of the main VSI.
When the PF adds or deletes a port VLAN filter for the VF, it will
extend to all the VSIs within that VF.
The driver supports ADQ and GTP filters in the PF. Note: You must
have a DDP package that supports GTP; the default OS package does
not. Download the appropriate package from your hardware vendor and
load it on your device.
ADQ allows tc ingress filters that include any destination MAC
address.
You can configure up to 256 queue pairs (256 MSI-X interrupts) per
PF.
See Creating Traffic Class Filters in this README for more information
on configuring filters, including examples. See the E810 ADQ
Configuration Guide for detailed instructions.
ADQ KNOWN ISSUES:
The latest RHEL and SLES distros have kernels with back-ported
support for ADQ. For all other Linux distributions, you must use LTS
Linux kernel v4.19.58 or higher to use ADQ. The latest out-of-tree
driver is required for ADQ on all operating systems.
You must clear ADQ configuration in the reverse order of the initial
configuration steps. Issues may result if you do not execute the
steps to clear ADQ configuration in the correct order.
ADQ configuration is not supported on a bonded or teamed ice
interface. Issuing the ethtool or tc commands to a bonded ice
interface will result in error messages from the ice driver to
indicate the operation is not supported.
If the application stalls, the application-specific queues may stall
for up to two seconds. Configuring only one application per Traffic
Class (TC) channel may resolve the issue.
DCB and ADQ cannot coexist. A switch with DCB enabled might remove
the ADQ configuration from the device. To resolve the issue, do not
enable DCB on the switch ports being used for ADQ. You must disable
LLDP on the interface and stop the firmware LLDP agent using the
following command:
ethtool --set-priv-flags fw-lldp-agent off
MACVLAN offloads and ADQ are mutually exclusive. System instability
may occur if you enable "l2-fwd-offload" and then set up ADQ, or if
you set up ADQ and then enable "l2-fwd-offload".
Note (unrelated to Intel drivers): The version 5.8.0 Linux kernel
introduced a bug that broke the interrupt affinity setting
mechanism, which breaks the ability to pin interrupts to ADQ
hardware queues. Use an earlier or later version of the Linux
kernel.
A core-level reset of an ADQ-configured PF port (rare events usually
triggered by other failures in the device or ice driver) results in
loss of ADQ configuration. To recover, reapply the ADQ configuration
to the PF interface.
Commands such as "tc qdisc add" and "ethtool -L" will cause the
driver to close the associated RDMA interface and reopen it. This
will disrupt RDMA traffic for 3-5 seconds until the RDMA interface
is available again for traffic.
Commands such as "tc qdisc add" and "ethtool -L" will clear other
tuning settings such as interrupt affinity. These tuning settings
will need to be reapplied. When the number of queues are increased
using "ethtool -L", the new queues will have the same interrupt
moderation settings as queue 0 (i.e., Tx queue 0 for new Tx queues
and Rx queue 0 for new Rx queues). You can change this using the
ethtool per-queue coalesce commands.
TC filters may not get offloaded in hardware if you apply them
immediately after issuing the "tc qdisc add" command. We recommend
you wait 5 seconds after issuing "tc qdisc add" before adding TC
filters. Dmesg will report the error if TC filters fail to add
properly.
Setting Up ADQ
To set up the adapter for ADQ, where "<ethX>" is the interface in use:
1. Reload the ice driver to remove any previous TC configuration:
rmmod ice
modprobe ice
2. Enable hardware TC offload on the interface:
ethtool -K <ethX> hw-tc-offload on
3. Disable LLDP on the interface, if it isn't already:
ethtool --set-priv-flags <ethX> fw-lldp-agent off
4. Verify settings:
ethtool -k <ethX> | grep "hw-tc"
ethtool --show-priv-flags <ethX>
ADQ Configuration Script
Intel also provides a script to configure ADQ. This script allows you
configure ADQ-specific parameters such as traffic classes, priority,
filters, and ethtool parameters.
Refer to the "README.md" file in "scripts/adqsetup" inside the driver
tarball for more information.
The ice driver supports ADQ acceleration using independent pollers.
Independent pollers are kernel threads invoked by interrupts and are
used for busy polling on behalf of the application.
You can configure the number of queues per poller and poller timeout
per ADQ traffic class (TC) or queue group using the "devlink dev
param" interface.
To set the number of queue pairs per poller, use the following:
devlink dev param set <pci/D:b:d.f> name tc<x>_qps_per_poller value <num> cmode runtime
Where:
<pci/D:b:d.f>:
The PCI address of the device (pci/Domain:bus:device.function).
tc<x>:
The traffic class number.
<num>:
The number of queues of the corresponding traffic class that each
poller would poll.
To set the timeout for the independent poller, use the following:
devlink dev param set <pci/D:b:d.f> name tc<x>_poller_timeout value <num> cmode runtime
Where:
<pci/D:b:d.f>:
The PCI address of the device (pci/Domain:bus:device.function).
tc<x>:
The traffic class number.
<num>:
A nonzero integer value in jiffies.
For example:
* To configure 3 queues of TC1 to be polled by each independent
poller:
devlink dev param set pci/0000:3b:00.0 name tc1_qps_per_poller value 3 cmode runtime
* To set the timeout value in jiffies for TC1 when no traffic is
flowing:
devlink dev param set pci/0000:3b:00.0 name tc1_poller_timeout value 1000 cmode runtime
Configuring ADQ Flows per Traffic Class
The ice out-of-tree driver allows you to configure inline Intel(R)
Ethernet Flow Director (Intel(R) Ethernet FD) filters per traffic
class (TC) using the devlink interface. Inline Intel Ethernet FD
allows uniform distribution of flows among queues in a TC.
Note:
This functionality requires Linux kernel version 5.6 or newer and
is supported only with the out-of-tree ice driver.
You must enable Transmit Packet Steering (XPS) using receive
queues for this feature to work correctly.
Per-TC filters set with devlink are not compatible with Intel
Ethernet FD filters set via ethtool.
Use the following to configure inline Intel Ethernet FD filters per
TC:
devlink dev param set <pci/D:b:d.f> name tc_inline_fd value cmode runtime
Where:
<pci/D:b:d.f>:
The PCI address of the device (pci/Domain:bus:device.function).
tc:
The traffic class number.
:
Set to true to enable inline per-TC Intel Ethernet FD, or false to
disable it.
For example, to enable inline Intel Ethernet FD for TC1:
devlink dev param set pci/0000:af:00.0 name tc1_inline_fd value true cmode runtime
To show the current inline Intel Ethernet FD setting:
devlink dev param show name tc_inline_fd
For example, to show the inline Intel Ethernet FD setting for TC2 for
the specified device:
devlink dev param show pci/0000:af:00.0 name tc2_inline_fd
Creating Traffic Classes
------------------------
Note:
These instructions are not specific to ADQ configuration. Refer to
the tc and tc-flower man pages for more information on creating
traffic classes (TCs).
To create traffic classes on the interface:
1. Use the tc command to create traffic classes. You can create a
maximum of 16 TCs per interface:
tc qdisc add dev root mqprio num_tc map
queues hw 1 mode channel shaper bw_rlimit
min_rate max_rate
Where:
num_tc :
The number of TCs to use.
map :
The map of priorities to TCs. You can map up to 16 priorities to
TCs.
queues :
For each TC, "\@". The max total number of
queues for all TCs is the number of cores.
hw 1 mode channel:
"channel" with "hw" set to 1 is a new hardware offload mode in
mqprio that makes full use of the mqprio options, the TCs, the
queue configurations, and the QoS parameters.
shaper bw_rlimit:
For each TC, sets the minimum and maximum bandwidth rates. The
totals must be equal to or less than the port speed. This
parameter is optional and is required only to set up the Tx
rates.
min_rate :
Sets the minimum bandwidth rate limit for each TC.
max_rate :
Sets the maximum bandwidth rate limit for each TC. You can set a
min and max rate together.
Note:
* If you set "max_rate" to less than 50Mbps, then "max_rate" is
rounded up to 50Mbps and a warning is logged in dmesg.
* See the mqprio man page and the examples below for more
information.
2. Verify the bandwidth limit using network monitoring tools such as
"ifstat" or "sar -n DEV [interval] [number of samples]".
Note:
Setting up channels via ethtool ("ethtool -L") is not supported
when the TCs are configured using mqprio.
3. Enable hardware TC offload on the interface:
ethtool -K hw-tc-offload on
2. Add clsact qdisc to enable adding ingress/egress filters for Rx/Tx:
tc qdisc add dev clsact
3. Verify successful TC creation after qdisc is created:
tc qdisc show dev ingress
TRAFFIC CLASS EXAMPLES:
See the tc and tc-flower man pages for more information on traffic
control and TC flower filters.
* To set up two TCs (tc0 and tc1), with 16 queues each, priorities 0-3
for tc0 and 4-7 for tc1, and max Tx rate set to 1Gbit for tc0 and
3Gbit for tc1:
tc qdisc add dev ens4f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 queues
16@0 16@16 hw 1 mode channel shaper bw_rlimit max_rate 1Gbit 3Gbit
Where:
map 0 0 0 0 1 1 1 1:
Sets priorities 0-3 to use tc0 and 4-7 to use tc1
queues 16@0 16@16:
Assigns 16 queues to tc0 at offset 0 and 16 queues to tc1 at
offset 16
* To create 8 TCs with 256 queues spread across all the TCs, when ADQ
is enabled:
tc qdisc add dev root mqprio num_tc 8 map 0 1 2 3 4 5 6 7
queues 2@0 4@2 8@6 16@14 32@30 64@62 128@126 2@254 hw 1 mode channel
* To set a minimum rate for a TC:
tc qdisc add dev ens4f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 queues
4@0 8@4 hw 1 mode channel shaper bw_rlimit min_rate 25Gbit 50Gbit
* To set a maximum data rate for a TC:
tc qdisc add dev ens4f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 queues
4@0 8@4 hw 1 mode channel shaper bw_rlimit max_rate 25Gbit 50Gbit
* To set both minimum and maximum data rates together:
tc qdisc add dev ens4f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 queues
4@0 8@4 hw 1 mode channel shaper bw_rlimit min_rate 10Gbit 20Gbit
max_rate 25Gbit 50Gbit
Creating Traffic Class Filters
------------------------------
Note:
These instructions are not specific to ADQ configuration.
After creating traffic classes, use the tc command to create filters
for traffic. Refer to the tc and tc-flower man pages for more
information.
To view all TC filters:
tc filter show dev ingress
tc filter show dev egress
For detailed configuration information, supported fields, and example
code for switchdev mode on Intel Ethernet 800 Series devices, refer to
the configuration guide at https://edc.intel.com/content/www/us/en/de
sign/products/ethernet/appnote-e810-eswitch-switchdev-mode-config-
guide/.
TC FILTER EXAMPLES:
To configure TCP TC filters, where:
protocol:
Encapsulation protocol (valid options are IP and 802.1Q).
prio:
Priority.
flower:
Flow-based traffic control filter.
dst_ip:
IP address of the device.
ip_proto:
IP protocol to use (TCP or UDP).
dst_port:
Destination port.
src_port:
Source port.
skip_sw:
Flag to add the rule only in hardware.
hw_tc:
Route incoming traffic flow to this hardware TC. The TC count
starts at 0. For example, "hw_tc 1" indicates that the filter is on
the second TC.
vlan_id:
VLAN ID.
* TCP: Destination IP + L4 Destination Port
To route incoming TCP traffic with a matching destination IP address
and destination port to the given TC:
tc filter add dev protocol ip ingress prio 1 flower dst_ip
ip_proto tcp dst_port skip_sw hw_tc 1
* TCP: Source IP + L4 Source Port
To route outgoing TCP traffic with a matching source IP address and
source port to the given TC associated with the given priority:
tc filter add dev protocol ip egress prio 1 flower src_ip
ip_proto tcp src_port action skbedit priority 1
* TCP: Destination IP + L4 Destination Port + VLAN Protocol
To route incoming TCP traffic with a matching destination IP address
and destination port to the given TC using the VLAN protocol
(802.1Q):
tc filter add dev protocol 802.1Q ingress prio 1 flower
dst_ip eth_type ipv4 ip_proto tcp dst_port
vlan_id skip_sw hw_tc 1
* To add a GTP filter:
tc filter add dev protocol ip parent ffff: prio 1 flower
src_ip 16.0.0.0/16 ip_proto udp dst_port 5678 enc_dst_port 2152
enc_key_id skip_sw hw_tc 1
Where:
dst_port:
inner destination port of application (5678)
enc_dst_port:
outer destination port (for GTP user data tunneling occurs on UDP
port 2152)
enc_key_id:
tunnel ID (vxlan ID)
Note:
You can add multiple filters to the device, using the same recipe
(and requires no additional recipe resources), either on the same
interface or on different interfaces. Each filter uses the same
fields for matching, but can have different match values.
tc filter add dev protocol ip ingress prio 1 flower ip_proto
tcp dst_port skip_sw hw_tc 1
tc filter add dev protocol ip egress prio 1 flower ip_proto tcp
src_port action skbedit priority 1
For example:
tc filter add dev ens4f0 protocol ip ingress prio 1 flower ip_proto
tcp dst_port 5555 skip_sw hw_tc 1
tc filter add dev ens4f0 protocol ip egress prio 1 flower ip_proto
tcp src_port 5555 action skbedit priority 1
Using TC Filters to Forward to a Queue
--------------------------------------
The ice driver supports directing traffic based on L2/L3/L4 fields in
the packet to specific Rx queues, using the TC filter's class ID.
Note:
This functionality can be used with or without ADQ.
To add filters for the desired queue, use the following tc command:
tc filter add dev ingress prio 1 protocol all flower src_mac
skip_sw classid ffff:
Where:
:
the MAC address(es) you want to direct to the Rx queue
:
the Rx queue ID number in hexadecimal
For example, to direct a single MAC address to queue 10:
ethtool -K ens801 hw-tc-offload on
tc qdisc add dev ens801 clsact
tc filter add dev ens801 ingress prio 1 protocol all flower src_mac
68:dd:ac:dc:19:00 skip_sw classid ffff:b
To direct 4 source MAC addresses to Rx queues 10-13:
ethtool -K ens801 hw-tc-offload on
tc qdisc add dev ens801 clsact
tc filter add dev ens801 ingress prio 1 protocol all flower src_mac
68:dd:ac:dc:19:00 skip_sw classid ffff:b
tc filter add dev ens801 ingress prio 1 protocol all flower src_mac
68:dd:ac:dc:19:01 skip_sw classid ffff:c
tc filter add dev ens801 ingress prio 1 protocol all flower src_mac
68:dd:ac:dc:19:02 skip_sw classid ffff:d
tc filter add dev ens801 ingress prio 1 protocol all flower src_mac
68:dd:ac:dc:19:03 skip_sw classid ffff:e
Intel(R) Ethernet Flow Director
-------------------------------
The Intel(R) Ethernet Flow Director (Intel(R) Ethernet FD) performs
the following tasks:
* Directs receive packets according to their flows to different queues
* Enables tight control on routing a flow in the platform
* Matches flows and CPU cores for flow affinity
Note:
An included script ("set_irq_affinity") automates setting the IRQ to
CPU affinity.
This driver supports the following flow types:
* IPv4
* TCPv4
* UDPv4
* SCTPv4
* IPv6
* TCPv6
* UDPv6
* SCTPv6
Each flow type supports valid combinations of IP addresses (source or
destination) and UDP/TCP/SCTP ports (source and destination). You can
supply only a source IP address, a source IP address and a destination
port, or any combination of one or more of these four parameters.
Note:
This driver allows you to filter traffic based on a user-defined
flexible two-byte pattern and offset by using the ethtool user-def
and mask fields. Only L3 and L4 flow types are supported for user-
defined flexible filters. For a given flow type, you must clear all
Intel Ethernet Flow Director filters before changing the input set
(for that flow type).
Intel Ethernet Flow Director filters impact only LAN traffic. RDMA
filtering occurs before Intel Ethernet Flow Director, so Intel
Ethernet Flow Director filters will not impact RDMA.
See the Intel(R) Ethernet Adapters and Devices User Guide for a table
that summarizes supported Intel Ethernet Flow Director features across
Intel(R) Ethernet controllers.
Intel Ethernet Flow Director Filters
------------------------------------
Intel Ethernet Flow Director filters are used to direct traffic that
matches specified characteristics. They are enabled through ethtool's
ntuple interface. To enable or disable the Intel Ethernet Flow
Director and these filters:
ethtool -K ntuple
Note:
When you disable ntuple filters, all the user programmed filters are
flushed from the driver cache and hardware. All needed filters must
be re-added when ntuple is re-enabled.
To display all of the active filters:
ethtool -u
To add a new filter:
ethtool -U flow-type src-ip [m ] dst-ip
[m ] src-port [m ] dst-port [m ]
action
Where:
:
The Ethernet device to program
:
Can be ip4, tcp4, udp4, sctp4, ip6, tcp6, udp6, sctp6
:
The IP address to match on
:
The IPv4 address to mask on
Note:
These filters use inverted masks. An inverted mask with 0 means
exactly match while with 0xF means DON'T CARE. Please refer to
the examples for more details about inverted masks.
:
The port number to match on
:
The 16-bit integer for masking
Note:
These filters use inverted masks.
:
The queue to direct traffic toward (-1 discards the matched
traffic)
To delete a filter:
ethtool -U delete
Where "" is the filter ID displayed when printing all the active
filters, and may also have been specified using "loc " when adding
the filter.
EXAMPLES:
To add a filter that directs packet to queue 2:
ethtool -U flow-type tcp4 src-ip 192.168.10.1 dst-ip \
192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
To set a filter using only the source and destination IP address:
ethtool -U flow-type tcp4 src-ip 192.168.10.1 dst-ip \
192.168.10.2 action 2 [loc 1]
To set a filter based on a user-defined pattern and offset, where the
value of the "user-def" field contains the offset (4 bytes) and the
pattern (0xffff):
ethtool -U flow-type tcp4 src-ip 192.168.10.1 dst-ip \
192.168.10.2 user-def 0x4FFFF action 2 [loc 1]
To match TCP traffic sent from 192.168.0.1, port 5300, directed to
192.168.0.5, port 80, and then send it to queue 7:
ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
src-port 5300 dst-port 80 action 7
To add a TCPv4 filter with a partial mask for a source IP subnet. Here
the matched src-ip is 192.*.*.* (inverted mask):
ethtool -U flow-type tcp4 src-ip 192.168.0.0 m 0.255.255.255 dst-ip \
192.168.5.12 src-port 12600 dst-port 31 action 12
Note:
For each flow-type, the programmed filters must all have the same
matching input set. For example, issuing the following two commands
is acceptable:
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
Issuing the next two commands, however, is not acceptable, since the
first specifies "src-ip" and the second specifies "dst-ip":
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
The second command will fail with an error. You may program multiple
filters with the same fields, using different values, but, on one
device, you may not program two tcp4 filters with different matching
fields.The ice driver does not support matching on a subportion of a
field, thus partial mask fields are not supported.
Flex Byte Intel Ethernet Flow Director Filters
----------------------------------------------
The driver also supports matching user-defined data within the packet
payload. This flexible data is specified using the "user-def" field of
the ethtool command in the following way:
+----------------------------+--------------------------+
| 31 28 24 20 16 | 15 12 8 4 0 |
+----------------------------+--------------------------+
| offset into packet payload | 2 bytes of flexible data |
+----------------------------+--------------------------+
For example:
... user-def 0x4FFFF ...
tells the filter to look 4 bytes into the payload and match that value
against 0xFFFF. The offset is based on the beginning of the payload,
and not the beginning of the packet. Thus:
flow-type tcp4 ... user-def 0x8BEAF ...
would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into
the TCP/IPv4 payload.
Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of
payload. Thus to match the first byte of the payload, you must
actually add 4 bytes to the offset. Also note that ip4 filters match
both ICMP frames as well as raw (unknown) ip4 frames, where the
payload will be the L3 payload of the IP4 frame.
The maximum offset is 64. The hardware will only read up to 64 bytes
of data from the payload. The offset must be even because the flexible
data is 2 bytes long and must be aligned to byte 0 of the packet
payload.
The user-defined flexible offset is also considered part of the input
set and cannot be programmed separately for multiple filters of the
same type. However, the flexible data is not part of the input set and
multiple filters may use the same offset but match against different
data.
RSS Hash Flow
-------------
Allows you to set the hash bytes per flow type and any combination of
one or more options for Receive Side Scaling (RSS) hash byte
configuration.
ethtool -N rx-flow-hash