Open pfrenard opened 4 years ago
Unfortunately you ignored my suggestion to report this issue to the linux-netdev mailing list. Could you please explain why?
Repeat the test on a current Raspbian image and report back.
I am not able to reproduce the issue with Raspbian. If I understand well, Raspbian includes a kernel patch that disables TSO by default, and TSO enabled was the reason of this issue.
Yes, it's the commit 5f0e4c1cc51a2aee86b2a554b65cb0a7909a6e02 i linked in my email on arm@lists.fedoraproject.org.
Not sure why this is being reported here, clearly it's a Fedora/other problem and should be reported on the appropriate bug tracker?
Fedora doesn't have the resources to fixes this issue. It should be reported to linux-netdev, as i already wrote before.
I reported the issue to the mailing list. The idea was to let you know that the patch is not available for all distro/kernel.
I reported the issue to the mailing list. The idea was to let you know that the patch is not available for all distro/kernel.
Then you should have said so in your report instead of waiting for us to drag it out of you.
Eric Dumazet has submitted a patch for this upstream - we'll back-port it once it gets merged.
It'll be interesting to test, but I'm not sure it is the cause.
My recollection is as per https://github.com/raspberrypi/linux/issues/2482#issuecomment-396952454 / https://github.com/raspberrypi/linux/issues/2449#issuecomment-396929496
Should an offloaded packet get dropped (potentially due to skb_linearize failing), then it is never retransmitted, so the other end keeps sending a TCP Selective ACK back reporting the missing section. This continued until it got to the end of the file and still hadn't had the missing section, at which point it stalled. I think Eric's patch is only dealling with a resource leak, but we'll see.
I confirm that this is not solving the issue. :( I tried on the last kernel 5.5.0.rc5.git0.1.fc32.aarch64 from fedora.
Maybe this fixes at least #2928
Eric proposed a new patch diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index d3239b49c3bb2803c874a2e8af332bcf03848e18..65dea9a94b90e27889c8f44294560ffeabda2eb9 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -3787,6 +3787,8 @@ static int lan78xx_probe(struct usb_interface *intf, if (ret < 0) goto out4;
ret = register_netdev(netdev);
if (ret != 0) {
netif_err(dev, probe, netdev, "couldn't register the device\n");
with it I am not able to reproduce the issue anymore :)
Eric's new patch is in linux-next:
f8d7408a4d7f ("net: usb: lan78xx: limit size of local TSO packets")
It's now in rpi-4.19.y and rpi-5.4.y, along with its predecessor:
47240ba0cd09 ("net: usb: lan78xx: fix possible skb leak")
I've also submitted another patch on this as suggested by EricD.
As reported by Eric Dumazet, there are still some outstanding
cases where the driver does not handle TSO correctly when skb's
are over a certain size. Most cases have been fixed, this patch
should ensure that forwarded SKB's that are greater than
MAX_SINGLE_PACKET_SIZE - TX_OVERHEAD are software segmented
and handled correctly.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
---
drivers/net/usb/lan78xx.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index bc572b921fe1..a01c78d8b9a3 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -31,6 +31,7 @@
#include <linux/mdio.h>
#include <linux/phy.h>
#include <net/ip6_checksum.h>
+#include <net/vxlan.h>
#include <linux/interrupt.h>
#include <linux/irqdomain.h>
#include <linux/irq.h>
@@ -3733,6 +3734,19 @@ static void lan78xx_tx_timeout(struct net_device *net)
tasklet_schedule(&dev->bh);
}
+static netdev_features_t lan78xx_features_check(struct sk_buff *skb,
+ struct net_device *netdev,
+ netdev_features_t features)
+{
+ if (skb->len + TX_OVERHEAD > MAX_SINGLE_PACKET_SIZE)
+ features &= ~NETIF_F_GSO_MASK;
+
+ features = vlan_features_check(skb, features);
+ features = vxlan_features_check(skb, features);
+
+ return features;
+}
+
static const struct net_device_ops lan78xx_netdev_ops = {
.ndo_open = lan78xx_open,
.ndo_stop = lan78xx_stop,
@@ -3746,6 +3760,7 @@ static const struct net_device_ops lan78xx_netdev_ops = {
.ndo_set_features = lan78xx_set_features,
.ndo_vlan_rx_add_vid = lan78xx_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = lan78xx_vlan_rx_kill_vid,
+ .ndo_features_check = lan78xx_features_check,
};
static void lan78xx_stat_monitor(struct timer_list *t)
--
2.17.1
Drop us a comment when it gets merged.
Commit ce896476c65d ("net: usb: lan78xx: Add .ndo_features_check") has been cherry-picked to rpi-4.19.y and rpi-5.4.y.
Is this the right place for my bug report? My issue is the same as #2482 or #2449 I am using Fedora 31 with kernel 5.4.7 (5.4.7-200.fc31.aarch64) running aarch64.
Describe the bug when doing a huge transfer using cp on NFS or scp, after more than 1GB upload, upload hangs and failed after a few minutes.
To reproduce
dd if=/dev/zero of=/tmp/data.dd bs=4M count=1000 status=progress scp /tmp/data.dd login@server:/directory
Expected behaviour fully transfered file
Actual behaviour hangs after 1GB and failed after a few minutes
System
cat /etc/rpi-issue
)? Fedora 31,vcgencmd version
)? no vcgencmd on the systemuname -a
)? 5.4.7-200.fc31.aarch64, Aarch64Logs None
Additional context None