luigirizzo / netmap

Automatically exported from code.google.com/p/netmap
BSD 2-Clause "Simplified" License
1.86k stars 537 forks source link

Netmap i40e driver disables promiscuous mode #417

Closed jmtilli closed 6 years ago

jmtilli commented 6 years ago

I have on my machine this:

$ modinfo i40e
filename:       /lib/modules/4.11.0-14-generic/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
version:        1.6.27-k
license:        GPL
description:    Intel(R) Ethernet Connection XL710 Network Driver
author:         Intel Corporation, 
srcversion:     458A5BB6C17FAE7A25322D6
alias:          pci:v00008086d0000158Bsv*sd*bc*sc*i*
alias:          pci:v00008086d0000158Asv*sd*bc*sc*i*
alias:          pci:v00008086d00001588sv*sd*bc*sc*i*
alias:          pci:v00008086d00001587sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D3sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D2sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D1sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D0sv*sd*bc*sc*i*
alias:          pci:v00008086d000037CFsv*sd*bc*sc*i*
alias:          pci:v00008086d000037CEsv*sd*bc*sc*i*
alias:          pci:v00008086d00001589sv*sd*bc*sc*i*
alias:          pci:v00008086d00001586sv*sd*bc*sc*i*
alias:          pci:v00008086d00001585sv*sd*bc*sc*i*
alias:          pci:v00008086d00001584sv*sd*bc*sc*i*
alias:          pci:v00008086d00001583sv*sd*bc*sc*i*
alias:          pci:v00008086d00001581sv*sd*bc*sc*i*
alias:          pci:v00008086d00001580sv*sd*bc*sc*i*
alias:          pci:v00008086d00001574sv*sd*bc*sc*i*
alias:          pci:v00008086d00001572sv*sd*bc*sc*i*
depends:        ptp
intree:         Y
vermagic:       4.11.0-14-generic SMP mod_unload 
parm:           debug:Debug level (0=none,...,16=all), Debug mask (0x8XXXXXXX) (uint)

Two ports of the NIC are connected to each other with a direct attach cable.

Now I configure the interfaces up and set promiscuous mode:

for DEVICE in enp66s0f0 enp66s0f1; do
  sudo ethtool -K $DEVICE tso off gso off gro off lro off
  sudo ethtool -K $DEVICE rx off tx off
  sudo ip link set dev $DEVICE up
  sudo ip link set dev $DEVICE promisc on
done

Then I run tcpdump on the second port:

$ sudo tcpdump -evvvvvvvvni enp66s0f1
tcpdump: listening on enp66s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes

And replay some traffic with incorrect MAC on the first port:

$ sudo tcpreplay -i enp66s0f0 netmapsendmac.pcap 

I observe the traffic on the second port.

Then I load the netmap-specific i40e driver:

sudo rmmod i40e
sudo rmmod netmap
sudo insmod ./netmapi40e/LINUX/netmap.ko
sudo insmod ./netmapi40e/LINUX/i40e/i40e.ko
$ modinfo ./netmapi40e/LINUX/i40e/i40e.ko 
filename:       /home/ubuntu/./netmapi40e/LINUX/i40e/i40e.ko
version:        2.3.6
license:        GPL
description:    Intel(R) 40-10 Gigabit Ethernet Connection Network Driver
author:         Intel Corporation, 
srcversion:     E7FD78FEA9D6FFABF865591
alias:          pci:v00008086d0000158Bsv*sd*bc*sc*i*
alias:          pci:v00008086d0000158Asv*sd*bc*sc*i*
alias:          pci:v00008086d000037D3sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D2sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D1sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D0sv*sd*bc*sc*i*
alias:          pci:v00008086d000037CFsv*sd*bc*sc*i*
alias:          pci:v00008086d000037CEsv*sd*bc*sc*i*
alias:          pci:v00008086d00001588sv*sd*bc*sc*i*
alias:          pci:v00008086d00001587sv*sd*bc*sc*i*
alias:          pci:v00008086d00001589sv*sd*bc*sc*i*
alias:          pci:v00008086d00001586sv*sd*bc*sc*i*
alias:          pci:v00008086d00001585sv*sd*bc*sc*i*
alias:          pci:v00008086d00001584sv*sd*bc*sc*i*
alias:          pci:v00008086d00001583sv*sd*bc*sc*i*
alias:          pci:v00008086d00001581sv*sd*bc*sc*i*
alias:          pci:v00008086d00001580sv*sd*bc*sc*i*
alias:          pci:v00008086d00001574sv*sd*bc*sc*i*
alias:          pci:v00008086d00001572sv*sd*bc*sc*i*
depends:        netmap,ptp
vermagic:       4.11.0-14-generic SMP mod_unload 
parm:           debug:Debug level (0=none,...,16=all) (int)
parm:           ix_crcstrip:int
parm:           ix_rx_miss:int
parm:           ix_rx_miss_bufs:int

I bring both interfaces up and into promiscuous mode:

for DEVICE in enp66s0f0 enp66s0f1; do
  sudo ethtool -K $DEVICE tso off gso off gro off lro off
  sudo ethtool -K $DEVICE rx off tx off
  sudo ip link set dev $DEVICE up
  sudo ip link set dev $DEVICE promisc on
done

I run tcpdump on the second port:

$ sudo tcpdump -evvvvvvvvni enp66s0f1
tcpdump: listening on enp66s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes

And replay some traffic with incorrect MAC on the first port:

$ sudo tcpreplay -i enp66s0f0 netmapsendmac.pcap 

It works now. I close tcpdump.

Then I start my netmap application:

$ sudo ./netmapproxy1 netmap:enp66s0f1 vale1:1

And close it by pressing Ctrl-C. Then I reopen tcpdump:

$ sudo tcpdump -evvvvvvvvni enp66s0f1
tcpdump: listening on enp66s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes

...and run tcpreplay again:

$ sudo tcpreplay -i enp66s0f0 netmapsendmac.pcap 

The tcpdump now doesn't see the packets at all!

So, clearly, netmap i40e is interacting badly with the promiscuous mode of the driver. After the netmap mode has been used once with the network interface, tcpdump no longer works correctly in promiscuous mode.

Also, more worryingly, the packet processing application (netmapproxy1, a custom application) didn't see the packets at all. So, when the interface is in netmap mode, the promiscuous mode flag doesn't work, and after leaving the netmap mode, promiscuous mode no longer works.

This is highly detrimental to packet processing applications that are expected to work at layer 2, for example inline firewalls, intrusion detection/prevention systems, virtual switches and such.

jmtilli commented 6 years ago

This seems to work if I execute the following commands AFTER starting the netmap packet processing application:

$ sudo ip link set dev enp66s0f1 promisc off
$ sudo ip link set dev enp66s0f1 promisc on

So I have a workaround. I must first start the program, then turn promiscuous mode off and back on again. However, it would be great if netmap remembered the promiscuous mode flag, so that I don't have to reset it all the time.

giuseppelettieri commented 6 years ago

We bring the interface down and then up when we switch any ring to netmap mode, and this probably resets the promiscuous flag. To remember the flag we need to add code for each driver, and make it compile and work across all driver and kernel versions. This may range from easy to hellish.

The long-term plan is to remove the down/up cycle entirely.

jmtilli commented 6 years ago

Ok. For layer 2 packet processing applications, it might make most sense to reset the flag in the application itself after nm_open(), as usually it is desired that applications are self-contained and require no prior external configuration. So I won't mind if this issue is categorized "enhancement" instead of "bug" because there is a good workaround.

Strange thing is, I have previously observed the promiscuous mode to work on Intel's gigabit cards. Perhaps I didn't have the netmap-specific driver enabled and perhaps the emulated mode works a bit differently from the native mode?

However, now LINUX/README has this:

* if you are using netmap to implement an L2 switch (e.g. using the
  bridge application), you must put the NIC in promiscuous mode,
  otherwise the NIC (usually) drops all the frames whose destination
  MAC is different from the MAC of the NIC.

      # ip link set eth0 promisc on

These instructions don't work at least in my i40e setup, unless the command is executed after bridge startup. So perhaps it should be mentioned that this promiscuous mode setting in some drivers may need to be done AFTER the netmap application is started. And in case the driver thinks promisc mode is already on, it may be safer to instruct the user to do the following:

      # ip link set eth0 promisc off
      # ip link set eth0 promisc on
vmaffione commented 6 years ago

You are welcome to amend the README with a pull request, as you say.

jmtilli commented 6 years ago

I have found the following workaround in my code:

   char pktdl[14] = {0x02,0,0,0,0,0x04, 0x02,0,0,0,0,0x01, 0, 0};
   char pktul[14] = {0x02,0,0,0,0,0x01, 0x02,0,0,0,0,0x04, 0, 0};

-  nm_my_inject(dlnmds[0], pktdl, sizeof(pktdl));
-  ioctl(dlnmds[0]->fd, NIOCTXSYNC, NULL);
-  nm_my_inject(ulnmds[0], pktul, sizeof(pktul));
-  ioctl(ulnmds[0]->fd, NIOCTXSYNC, NULL);
+  if (strncmp(argv[optind+0], "vale", 4) == 0)
+  {
+    nm_my_inject(dlnmds[0], pktdl, sizeof(pktdl));
+    ioctl(dlnmds[0]->fd, NIOCTXSYNC, NULL);
+  }
+  if (strncmp(argv[optind+1], "vale", 4) == 0)
+  {
+    nm_my_inject(ulnmds[0], pktul, sizeof(pktul));
+    ioctl(ulnmds[0]->fd, NIOCTXSYNC, NULL);
+  }

As you can see, I was sending a 14-byte long invalid Ethernet frame with protocol number zero to initialize the MAC hash tables of VALE so that all packets won't go to queue 0 as broadcast traffic. This invalid 14-byte long Ethernet frame caused the entire NIC to be reset, including promiscuous mode setting. Now I send the 14-byte long invalid Ethernet frame only if VALE is actually used. No NIC reset, no promiscuous mode setting loss.

You can close this issue if you find these 14-byte long frames as misuse of netmap. This is a classic case of "don't do that then".

giuseppelettieri commented 6 years ago

Well, I guess so. Thanks for the update.