pcengines / apu2-documentation

Documentation and scripts for building and adjusting PC Engines APU2 firmware
https://pcengines.github.io/apu2-documentation/
208 stars 46 forks source link

APU2D4: NIC: high RX error count #178

Open corbolais opened 4 years ago

corbolais commented 4 years ago

Hi,

enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether   txqueuelen 1000  (Ethernet)
        RX packets 193596201  bytes 57195194543 (53.2 GiB)
        RX errors 260371  dropped 0  overruns 0  frame 260371
        TX packets 297157067  bytes 327247392153 (304.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether   txqueuelen 1000  (Ethernet)
        RX packets 287683044  bytes 312139372971 (290.7 GiB)
        RX errors 3294066  dropped 0  overruns 0  frame 3293890
        TX packets 190733929  bytes 33493083338 (31.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet   netmask 255.255.255.0  broadcast 
        inet6   prefixlen 64  scopeid 0x0<global>
        inet6   prefixlen 64  scopeid 0x20<link>
        ether   txqueuelen 1000  (Ethernet)
        RX packets 117361281  bytes 73685355706 (68.6 GiB)
        RX errors 0  dropped 567  overruns 0  frame 0
        TX packets 105770826  bytes 65548634666 (61.0 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bridge name     bridge id               STP enabled     interfaces
br0             8000.f641f32e0889       no              enp1s0
                                                        enp2s0
[Sun Jun 14 15:48:31 2020] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[Sun Jun 14 15:48:31 2020] br0: port 1(enp2s0) entered disabled state
[Sun Jun 14 15:49:08 2020] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[Sun Jun 14 15:49:08 2020] br0: port 1(enp2s0) entered blocking state
[Sun Jun 14 15:49:08 2020] br0: port 1(enp2s0) entered forwarding state

On Sun Jun 14:

for i in {1..3}; do
  ethtool -s "enp${i}s0" autoneg off port tp speed 100 duplex full;
done;

Ideas anyone?

Thank you.

anonymous-one commented 4 years ago

Happy to have found this.

Running APU2D4 with OpenWRT on Kernel 5.4.36 with the 2nd to newest bios (may 2020 release from what I recall).

I am only able to get about 300-350mbit inbound on any interface (eth0 - eth2).

I have just spent about 4 hours debugging this and have verified all the usual suspects, like (and others):

Verified its not my switch (drect link). Verified its not firewall rules (iptables -F and also this machine hosts a VM, getting about 1-1.5 gbytes / sec between then VM and the host). Tried disabling all the offloading ethtool params as well as flow control. Tried multiple clients, same issue.

From my OpenWRT to a client:

[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   935 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   942 Mbits/sec
[  4]   2.00-3.00   sec   112 MBytes   941 Mbits/sec
[  4]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
[  4]   4.00-5.00   sec   112 MBytes   942 Mbits/sec
[  4]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
[  4]   6.00-7.00   sec   112 MBytes   941 Mbits/sec
[  4]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
[  4]   8.00-9.00   sec   112 MBytes   942 Mbits/sec
[  4]   9.00-10.00  sec   112 MBytes   942 Mbits/sec

From client to OpenWRT:

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  39.5 MBytes   331 Mbits/sec    0   1.83 MBytes
[  4]   1.00-2.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   2.00-3.00   sec  35.0 MBytes   294 Mbits/sec    0   3.00 MBytes
[  4]   3.00-4.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   4.00-5.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   5.00-6.00   sec  35.0 MBytes   294 Mbits/sec    0   3.00 MBytes
[  4]   6.00-7.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   7.00-8.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   8.00-9.00   sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes
[  4]   9.00-10.00  sec  36.2 MBytes   304 Mbits/sec    0   3.00 MBytes

Anyone?

anonymous-one commented 4 years ago

Welp ain't that annoying...

Turns out I had a bios about 5 versions back... Just upgraded to the latest from June 30 2020 and presto, ~950mbit both directions.

corbolais commented 4 years ago

@anonymous-one Where did you get a 2020-06-30 release from?

Latest as of https://pcengines.github.io/ is 2020-06-28 apu2 v4.12.0.2.

corbolais commented 4 years ago

Holy crap! Just flashed 2020-06-28 apu2 v4.12.0.2. It's up again. Not to 100% throughput but ~75% That's a substantial improvement. Thanks @anonymous-one for motivating me to just update the fw once more.

Yet there is still room for improvement, I'd like to see rather 95% throughput or above. As it once was.

anonymous-one commented 4 years ago

I forgot to do a copy but i believe the bios version where I was having the RX issues was something along the lines of v4.11.0.5?

Regardless I am now getting roughly 950mbit TX and RX regardless of client location (direct / via switch) on a 1gbit link.

I had to raise my ring buffers a little BTW, ethtool -G ethX rx XXXX or once in a while (frequently) I had some overrun packets.

After I set the rx ring buffers to 2048, zero overrun packets.

corbolais commented 4 years ago

@anonymous-one Thank you for your feedback.

I had a massive RX error count. Trying your suggestion now.

But since the fw upgrade yesterday I got no RX error count increase so far. Usually the crept up fairly quickly. So far, I'm getting consistently 75% performance, which is a huge improvement, yet to be increased.

Did your board show overrun packets even after the latest fw upgrade?

Edit: "overruns" are distinct from RX errors

anonymous-one commented 4 years ago

I had small bursts (100-200 at a time?) of overrun packets when pinning the link (eg: 950mbit).

And yep, rx errors vs overruns are different. Regardless, my understanding is the overruns are not desirable either although not as bad as the straight up rx errors.

@anonymous-one Thank you for your feedback.

I had a massive RX error count. Trying your suggestion now.

But since the fw upgrade yesterday I got no RX error count increase so far. Usually the crept up fairly quickly. So far, I'm getting consistently 75% performance, which is a huge improvement, yet to be increased.

Did your board show overrun packets even after the latest fw upgrade?

Edit: "overruns" are distinct from RX errors

Nikoos commented 2 years ago

It seems tha I have some identical issue as you guys, let me share with you my set-up

Actually, my main computer has a 10g nic and set to auto negociation/full duplex (10Gbs speed) connected to a Mikrotik switch with 10g ports and my apu2d4 :

When performing an iperf test to and from my computer to my apu2d4 :

From my desktop to apu2d4 :

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  41.0 MBytes   344 Mbits/sec  587   15.6 KBytes       
[  5]   1.00-2.00   sec  40.4 MBytes   339 Mbits/sec  565   15.6 KBytes       
[  5]   2.00-3.00   sec  42.9 MBytes   360 Mbits/sec  714   14.1 KBytes       
[  5]   3.00-4.00   sec  40.3 MBytes   338 Mbits/sec  668   14.1 KBytes       
[  5]   4.00-5.00   sec  39.6 MBytes   333 Mbits/sec  540   14.1 KBytes       
[  5]   5.00-6.00   sec  42.3 MBytes   354 Mbits/sec  728   14.1 KBytes       
[  5]   6.00-7.00   sec  40.0 MBytes   336 Mbits/sec  647   12.7 KBytes       
[  5]   7.00-8.00   sec  43.1 MBytes   362 Mbits/sec  622   21.2 KBytes       
[  5]   8.00-9.00   sec  45.1 MBytes   378 Mbits/sec  780   14.1 KBytes       
[  5]   9.00-10.00  sec  45.2 MBytes   379 Mbits/sec  892   19.8 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   420 MBytes   352 Mbits/sec  6743             sender
[  5]   0.00-10.01  sec   420 MBytes   352 Mbits/sec                  receiver 

On my apu2d4, I have a reception transmission error increase :

green0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 00:0d:b9:52:d6:39  txqueuelen 1000  (Ethernet)
        RX packets 3069435  bytes 3895959607 (3.6 GiB)
        RX errors 14286  dropped 0  overruns 0  frame 7143
        TX packets 2720069  bytes 3726867247 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd0400000-d041ffff

If I force my desktop NIC to 1Gb and retry again :

From my desktop to my apu2d4 :

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   115 MBytes   962 Mbits/sec    0    471 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   938 Mbits/sec    0    516 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec    0    539 KBytes       
[  5]   3.00-4.00   sec   113 MBytes   944 Mbits/sec    0    539 KBytes       
[  5]   4.00-5.00   sec   113 MBytes   948 Mbits/sec    0    539 KBytes       
[  5]   5.00-6.00   sec   112 MBytes   937 Mbits/sec    0    566 KBytes       
[  5]   6.00-7.00   sec   113 MBytes   944 Mbits/sec    0    566 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec    0    566 KBytes       
[  5]   8.00-9.00   sec   113 MBytes   944 Mbits/sec    0    566 KBytes       
[  5]   9.00-10.00  sec   113 MBytes   945 Mbits/sec    0    618 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  1.10 GBytes   941 Mbits/sec                  receiver```

Regarding network error : 

```[root@i264 ~]# ifconfig -a green0
green0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 00:0d:b9:52:d6:39  txqueuelen 1000  (Ethernet)
        RX packets 3884065  bytes 5127179752 (4.7 GiB)
        RX errors 14286  dropped 0  overruns 0  frame 7143
        TX packets 2783145  bytes 3736949748 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd0400000-d041ffff 

I am still not sure to understand, and It may be not related to this issue, the switch port where the apu2d4 is connected is in autonegociation, still, I tried to set the speed to 1G, once validated, the apu2d4 was unreachable.

NIkos