raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.16k stars 5.01k forks source link

[5.6.y/arm64] - rpi4 tx checksum issue. #3523

Open asavah opened 4 years ago

asavah commented 4 years ago

Just upgraded with a freshly built kernel from rpi-5.6.y branch, arm64. No userspace updates were made, just the kernel. Noticed 2 bugs, will report the second one in a separate issue.

Hardware: Machine model: Raspberry Pi 4 Model B Rev 1.1 , 4GB OS: own pet project, arm64

Bug: tcp tx-checksum seems half broken. More info below.

How this was noticed: apcupsd (ethernet mode) stopped working.

Sorry for the lenghty logs, maybe they are irrelevant as I found the cause and the temporal workaround.

Failure of apcupsd:

0.000 apcupsd: apcupsd.c:156 Options parsed.
0.000 apcupsd: apcconfig.c:755 After config scriptdir: "/etc/apcupsd"
0.000 apcupsd: apcconfig.c:756 After config pwrfailpath: "/etc/apcupsd"
0.000 apcupsd: apcconfig.c:757 After config nologinpath: "/etc"
0.000 apcupsd: apcupsd.c:179 Config file /etc/apcupsd/apcupsd.conf processed.
0.000 apcupsd: newups.c:102 write_lock at drivers.c:109
0.000 apcupsd: drivers.c:111 Looking for driver: net
0.000 apcupsd: drivers.c:115 Driver dumb is configured.
0.000 apcupsd: drivers.c:115 Driver apcsmart is configured.
0.000 apcupsd: drivers.c:115 Driver net is configured.
0.000 apcupsd: drivers.c:118 Driver net found and attached.
0.000 apcupsd: newups.c:108 write_unlock at drivers.c:135
0.000 apcupsd: drivers.c:137 Driver ptr=0x3b5837b0
0.000 apcupsd: drivers.c:138 Attached to driver: net
0.000 apcupsd: apcupsd.c:251 NIS thread started.
0.000 apcupsd: apclog.c:62 apcupsd 3.14.14 (31 May 2016) debian startup succeeded
0.001 apcupsd: newups.c:102 write_lock at net.c:420
0.001 apcupsd: net.c:204 Opening connection to 192.168.36.5:3551
0.001 apcupsd: net.c:220 ===============
0.001 apcupsd: apclog.c:62 NIS server startup succeeded
15.016 apcupsd: net.c:226 ===============
15.016 apcupsd: net.c:237 Buffer:

15.017 apcupsd: net.c:239 Exit poll_ups, stat=1
15.017 apcupsd: newups.c:108 write_unlock at net.c:453
15.017 apcupsd: newups.c:102 write_lock at net.c:549
15.017 apcupsd: net.c:204 Opening connection to 192.168.36.5:3551
15.017 apcupsd: net.c:220 ===============
30.032 apcupsd: net.c:226 ===============
30.032 apcupsd: net.c:230 Exit poll_ups 0 bad stat net_recv: Connection timed out
30.033 apcupsd: net.c:237 Buffer:

It seems to fail to communicate. I checked with tcpdump on the ubuntu server (192.168.36.5) which is actually connected to the UPS via USB and hosts apcupsd "master" instance.

192.168.36.14 is the pi, 192.168.36.5 is the server

16:46:40.298807 IP (tos 0x0, ttl 64, id 14610, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.3551 > 192.168.36.5.36394: Flags [F.], cksum 0xed24 (correct), seq 506, ack 1, win 126, length 0

16:46:40.679787 IP (tos 0x0, ttl 64, id 7965, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.42518 > 192.168.36.5.3551: Flags [P.], cksum 0x3087 (incorrect -> 0x308d), seq 3736629343:3736629345, ack 4241077629, win 126, length 2

Looks fishy, some of the checksums are correct, some are not.

ethtool -K eth0 tx off on the pi

and now:

17:02:12.795287 IP (tos 0x0, ttl 64, id 378, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.42564 > 192.168.36.5.3551: Flags [.], cksum 0x454b (correct), seq 9, ack 1120, win 126, length 0
17:02:15.274774 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.36.14.3551 > 192.168.36.5.36464: Flags [S.], cksum 0xb8e2 (correct), seq 3780082526, ack 133689554, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:02:15.275043 IP (tos 0x0, ttl 64, id 43633, offset 0, flags [DF], proto TCP (6), length 40)

All the checksums are correct and apcupsd works again.

asavah commented 4 years ago

PS: can provide any info if needed, verbose log, pcap dumps etc.

popcornmix commented 4 years ago

I did notice a checksum related upstream commit. https://github.com/raspberrypi/linux/commit/9a9ba2a4aaaa4e75a5f118b8ab642a55c34f95cb Does reverting that help?

asavah commented 4 years ago

@popcornmix No, it does not , same story.

it's hit and miss, some of the cheksums are ok, some are not.

tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:24:27.424938 IP (tos 0x0, ttl 64, id 63307, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [S], cksum 0xadb0 (correct), seq 3289994058, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:24:27.425174 IP (tos 0x0, ttl 64, id 63308, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [.], cksum 0xe122 (correct), seq 3289994059, ack 3221047239, win 126, length 0
17:24:27.425220 IP (tos 0x0, ttl 64, id 63309, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:27.626562 IP (tos 0x0, ttl 64, id 63310, offset 0, flags [DF], proto TCP (6), length 46)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0x96b6 (correct), seq 2:8, ack 1, win 126, length 6
17:24:27.626776 IP (tos 0x0, ttl 64, id 63311, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:27.834480 IP (tos 0x0, ttl 64, id 63312, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:28.242485 IP (tos 0x0, ttl 64, id 63313, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:29.098541 IP (tos 0x0, ttl 64, id 63314, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:30.762521 IP (tos 0x0, ttl 64, id 63315, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:31.251843 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.36.14.3551 > 192.168.36.5.36552: Flags [S.], cksum 0x3816 (correct), seq 305691679, ack 4039399120, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:24:31.252025 IP (tos 0x0, ttl 64, id 60105, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.3551 > 192.168.36.5.36552: Flags [.], cksum 0x735b (correct), seq 1, ack 3, win 126, length 0
17:24:31.252146 IP (tos 0x0, ttl 64, id 60106, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.3551 > 192.168.36.5.36552: Flags [.], cksum 0x7355 (correct), seq 1, ack 9, win 126, length 0
17:24:34.026505 IP (tos 0x0, ttl 64, id 63316, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:40.810420 IP (tos 0x0, ttl 64, id 63317, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:42.440334 IP (tos 0x0, ttl 64, id 63318, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [F.], cksum 0xe119 (correct), seq 8, ack 1, win 126, length 0
17:24:42.440489 IP (tos 0x0, ttl 64, id 61356, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [S], cksum 0x5806 (correct), seq 332552506, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
17:24:42.440522 IP (tos 0x0, ttl 64, id 63319, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10c (incorrect -> 0xe112), seq 0:2, ack 1, win 126, length 2
17:24:42.440524 IP (tos 0x0, ttl 64, id 63320, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [.], cksum 0xe118 (correct), seq 9, ack 2, win 126, length 0
17:24:42.440623 IP (tos 0x0, ttl 64, id 61357, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [.], cksum 0x176e (correct), seq 332552507, ack 3521227245, win 126, length 0
17:24:42.440648 IP (tos 0x0, ttl 64, id 61358, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:42.642339 IP (tos 0x0, ttl 64, id 61359, offset 0, flags [DF], proto TCP (6), length 46)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0xcd01 (correct), seq 2:8, ack 1, win 126, length 6
17:24:42.642369 IP (tos 0x0, ttl 64, id 63321, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10b (incorrect -> 0xe111), seq 0:2, ack 2, win 126, length 2
17:24:42.642533 IP (tos 0x0, ttl 64, id 61360, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:42.850265 IP (tos 0x0, ttl 64, id 61361, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:43.050303 IP (tos 0x0, ttl 64, id 63322, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10b (incorrect -> 0xe111), seq 0:2, ack 2, win 126, length 2
17:24:43.258272 IP (tos 0x0, ttl 64, id 61362, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:43.882373 IP (tos 0x0, ttl 64, id 63323, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10b (incorrect -> 0xe111), seq 0:2, ack 2, win 126, length 2
17:24:44.074321 IP (tos 0x0, ttl 64, id 61363, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:45.546346 IP (tos 0x0, ttl 64, id 63324, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10b (incorrect -> 0xe111), seq 0:2, ack 2, win 126, length 2
17:24:45.738189 IP (tos 0x0, ttl 64, id 61364, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:46.298193 IP (tos 0x0, ttl 64, id 60107, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.3551 > 192.168.36.5.36552: Flags [.], cksum 0x7354 (correct), seq 1, ack 10, win 126, length 0
17:24:48.810240 IP (tos 0x0, ttl 64, id 63325, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10b (incorrect -> 0xe111), seq 0:2, ack 2, win 126, length 2
17:24:49.002216 IP (tos 0x0, ttl 64, id 61365, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:55.658433 IP (tos 0x0, ttl 64, id 61366, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
17:24:55.658456 IP (tos 0x0, ttl 64, id 63326, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55238 > 192.168.36.5.3551: Flags [P.], cksum 0xe10b (incorrect -> 0xe111), seq 0:2, ack 2, win 126, length 2
17:24:57.445665 IP (tos 0x0, ttl 64, id 61367, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [F.], cksum 0x1765 (correct), seq 8, ack 1, win 126, length 0
17:24:57.445884 IP (tos 0x0, ttl 64, id 61368, offset 0, flags [DF], proto TCP (6), length 42)
    192.168.36.14.55240 > 192.168.36.5.3551: Flags [P.], cksum 0x1758 (incorrect -> 0x175e), seq 0:2, ack 1, win 126, length 2
pelwell commented 4 years ago

The bad checksums are for packets which aren't a multiple of 4 in length.

popcornmix commented 4 years ago

You mention testing on latest 5.6 64-bit kernel. Does checksum test work in other scenarios (e.g. 32-bit kernel, previous 5.6 kernel, 4.19 or 5.4 kernel)

asavah commented 4 years ago

5.5 is ok, I didn't have time to play with 5.6-rcX. Sadly I don't have a 32bit environment.

pelwell commented 4 years ago

https://www.raspberrypi.org/downloads/raspbian/ ;-)

asavah commented 4 years ago

This is from 5.5.y

ethtool -k eth0
ethtool -k eth0
Features for eth0:
rx-checksumming: off
tx-checksumming: off
        tx-checksum-ipv4: off
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]

This is from 5.6.y

Features for eth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]

https://github.com/raspberrypi/linux/commit/ae895c49905cd99aca24d23361b72ed559b30f4f

So it was not actually working on 5.5.y because offloads were disabled by default.

However enabling tx offload on 5.5.y seems to work.

Features for eth0:
rx-checksumming: off
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]

I will confirm whether reverting the commit mentioned above restores the behavior of 5.5.y.

asavah commented 4 years ago

Yes, indeed reverting https://github.com/raspberrypi/linux/commit/ae895c49905cd99aca24d23361b72ed559b30f4f restores working behavior.

But it might be something else also affecting tx checksums because enabling ethtool -K eth0 tx on with the commit mentioned above reverted also works ...

asavah commented 4 years ago

Yep, it's scatter-gather offload + tx csum produces this effect.

ethtool -K eth0 tx on
ethtool -K eth0 sg on

Results in wrong checksums, disabling any of them results in good checksums

pelwell commented 4 years ago

Thanks - that's a useful pointer.