RPI3B+ on board ethernet issue (NFS/SCP ...)

pfrenard commented 4 years ago

Is this the right place for my bug report? My issue is the same as #2482 or #2449 I am using Fedora 31 with kernel 5.4.7 (5.4.7-200.fc31.aarch64) running aarch64.

Describe the bug when doing a huge transfer using cp on NFS or scp, after more than 1GB upload, upload hangs and failed after a few minutes.

To reproduce

dd if=/dev/zero of=/tmp/data.dd bs=4M count=1000 status=progress scp /tmp/data.dd login@server:/directory

Expected behaviour fully transfered file

Actual behaviour hangs after 1GB and failed after a few minutes

System

Which model of Raspberry Pi? Pi3B+
Which OS and version (cat /etc/rpi-issue)? Fedora 31,
Which firmware version (vcgencmd version)? no vcgencmd on the system
Which kernel version (uname -a)? 5.4.7-200.fc31.aarch64, Aarch64

Logs None

Additional context None

lategoodbye commented 4 years ago

Unfortunately you ignored my suggestion to report this issue to the linux-netdev mailing list. Could you please explain why?

pelwell commented 4 years ago

Repeat the test on a current Raspbian image and report back.

pfrenard commented 4 years ago

I am not able to reproduce the issue with Raspbian. If I understand well, Raspbian includes a kernel patch that disables TSO by default, and TSO enabled was the reason of this issue.

lategoodbye commented 4 years ago

Yes, it's the commit 5f0e4c1cc51a2aee86b2a554b65cb0a7909a6e02 i linked in my email on arm@lists.fedoraproject.org.

JamesH65 commented 4 years ago

Not sure why this is being reported here, clearly it's a Fedora/other problem and should be reported on the appropriate bug tracker?

lategoodbye commented 4 years ago

Fedora doesn't have the resources to fixes this issue. It should be reported to linux-netdev, as i already wrote before.

pfrenard commented 4 years ago

I reported the issue to the mailing list. The idea was to let you know that the patch is not available for all distro/kernel.

pelwell commented 4 years ago

I reported the issue to the mailing list. The idea was to let you know that the patch is not available for all distro/kernel.

Then you should have said so in your report instead of waiting for us to drag it out of you.

pelwell commented 4 years ago

Eric Dumazet has submitted a patch for this upstream - we'll back-port it once it gets merged.

6by9 commented 4 years ago

It'll be interesting to test, but I'm not sure it is the cause.

My recollection is as per https://github.com/raspberrypi/linux/issues/2482#issuecomment-396952454 / https://github.com/raspberrypi/linux/issues/2449#issuecomment-396929496

Should an offloaded packet get dropped (potentially due to skb_linearize failing), then it is never retransmitted, so the other end keeps sending a TCP Selective ACK back reporting the missing section. This continued until it got to the end of the file and still hadn't had the missing section, at which point it stalled. I think Eric's patch is only dealling with a resource leak, but we'll see.

pfrenard commented 4 years ago

I confirm that this is not solving the issue. :( I tried on the last kernel 5.5.0.rc5.git0.1.fc32.aarch64 from fedora.

lategoodbye commented 4 years ago

Maybe this fixes at least #2928

pfrenard commented 4 years ago

Eric proposed a new patch diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index d3239b49c3bb2803c874a2e8af332bcf03848e18..65dea9a94b90e27889c8f44294560ffeabda2eb9 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -3787,6 +3787,8 @@ static int lan78xx_probe(struct usb_interface *intf, if (ret < 0) goto out4;

netif_set_gso_max_size(netdev, MAX_SINGLE_PACKET_SIZE - MAX_HEADER);

ret = register_netdev(netdev);
if (ret != 0) {
        netif_err(dev, probe, netdev, "couldn't register the device\n");

with it I am not able to reproduce the issue anymore :)

pelwell commented 4 years ago

Eric's new patch is in linux-next:

f8d7408a4d7f ("net: usb: lan78xx: limit size of local TSO packets")

It's now in rpi-4.19.y and rpi-5.4.y, along with its predecessor:

47240ba0cd09 ("net: usb: lan78xx: fix possible skb leak")

JamesH65 commented 4 years ago

I've also submitted another patch on this as suggested by EricD.

As reported by Eric Dumazet, there are still some outstanding
cases where the driver does not handle TSO correctly when skb's
are over a certain size. Most cases have been fixed, this patch
should ensure that forwarded SKB's that are greater than
MAX_SINGLE_PACKET_SIZE - TX_OVERHEAD are software segmented
and handled correctly.

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
---
 drivers/net/usb/lan78xx.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index bc572b921fe1..a01c78d8b9a3 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -31,6 +31,7 @@
 #include <linux/mdio.h>
 #include <linux/phy.h>
 #include <net/ip6_checksum.h>
+#include <net/vxlan.h>
 #include <linux/interrupt.h>
 #include <linux/irqdomain.h>
 #include <linux/irq.h>
@@ -3733,6 +3734,19 @@ static void lan78xx_tx_timeout(struct net_device *net)
  tasklet_schedule(&dev->bh);
 }

+static netdev_features_t lan78xx_features_check(struct sk_buff *skb,
+ struct net_device *netdev,
+ netdev_features_t features)
+{
+ if (skb->len + TX_OVERHEAD > MAX_SINGLE_PACKET_SIZE)
+ features &= ~NETIF_F_GSO_MASK;
+
+ features = vlan_features_check(skb, features);
+ features = vxlan_features_check(skb, features);
+
+ return features;
+}
+
 static const struct net_device_ops lan78xx_netdev_ops = {
  .ndo_open = lan78xx_open,
  .ndo_stop = lan78xx_stop,
@@ -3746,6 +3760,7 @@ static const struct net_device_ops lan78xx_netdev_ops = {
  .ndo_set_features = lan78xx_set_features,
  .ndo_vlan_rx_add_vid = lan78xx_vlan_rx_add_vid,
  .ndo_vlan_rx_kill_vid = lan78xx_vlan_rx_kill_vid,
+ .ndo_features_check = lan78xx_features_check,
 };

 static void lan78xx_stat_monitor(struct timer_list *t)
--
2.17.1

pelwell commented 4 years ago

Drop us a comment when it gets merged.

pelwell commented 4 years ago

Commit ce896476c65d ("net: usb: lan78xx: Add .ndo_features_check") has been cherry-picked to rpi-4.19.y and rpi-5.4.y.

raspberrypi / linux

RPI3B+ on board ethernet issue (NFS/SCP ...) #3395