Closed russagit closed 6 years ago
Are you able to deterministically reproduce this and try with latest code from github? (6.4.1 is an old version and it could be this issue has been fixed already). Thank you.
I went through all the commits of igb_main.c file since version 6.4.1 and frankly haven't seen any changes that might prevent that condition skb->len < skb->data_len from happening. Any ideas?
Hi @russagit, I assume you are using the igb driver distributed with pf_ring. Since I am not able to reproduce this, I am statically analysing the code to figure out what happened. You mentioned a "broken packet", are you able to provide it perhaps? Any additional info including the driver configuration (ethtool -k) is helpful. Thank you.
Sorry, I was wrong. We use a vanilla igb driver. All we currently have is a kernel core dump and the fact that switch failure coincides with a kernel panic (that's why a guess about broken packet, but it's only a guess). Do you think it's possible to recover a packet somehow from the core dump? Thanks
# ethtool -k ipbb02
Features for ipbb02:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
busy-poll: off [fixed]
@russagit are you able to reproduce this "switch failure" to trigger the crash? You should be able to access skb->data from the dump.
No, switch failure is not reproducible. Here is a hex skb->data dump
> x/272bx 0xffff88000ebcd5ce
0xffff88000ebcd5ce: 0x08 0x03 0x00 0xb8 0x46 0xdf 0x00 0x00
0xffff88000ebcd5d6: 0xff 0x84 0xf5 0x70 0x0a 0x32 0x97 0x02
0xffff88000ebcd5de: 0x0a 0x32 0xd3 0x09 0x17 0xd6 0x6a 0xa5
0xffff88000ebcd5e6: 0x49 0x67 0xec 0xfd 0x68 0xa8 0x73 0x11
0xffff88000ebcd5ee: 0x03 0x00 0x00 0x10 0x3e 0xb9 0x6c 0x51
0xffff88000ebcd5f6: 0x00 0x03 0x1c 0xe0 0x00 0x00 0x00 0x00
0xffff88000ebcd5fe: 0x00 0x03 0x00 0x44 0xa1 0xb1 0x82 0xac
0xffff88000ebcd606: 0x00 0x04 0x4c 0xd1 0x00 0x00 0x00 0x03
0xffff88000ebcd60e: 0x01 0x00 0x01 0x01 0x00 0x00 0x00 0x34
0xffff88000ebcd616: 0x02 0x10 0x00 0x29 0x00 0x00 0x38 0xa4
0xffff88000ebcd61e: 0x00 0x00 0x38 0xab 0x03 0x03 0x00 0x04
0xffff88000ebcd626: 0x06 0x07 0x2b 0x4a 0x00 0x01 0x12 0x00
0xffff88000ebcd62e: 0x10 0x53 0x07 0x02 0x06 0x00 0x0a 0x09
0xffff88000ebcd636: 0xff 0xd4 0x75 0xe2 0xac 0xf6 0xb7 0xd7
0xffff88000ebcd63e: 0x22 0x00 0x00 0x00 0x00 0x03 0x00 0x44
0xffff88000ebcd646: 0xa1 0xb1 0x82 0xad 0x00 0x04 0x4c 0xd2
0xffff88000ebcd64e: 0x00 0x00 0x00 0x03 0x01 0x00 0x01 0x01
0xffff88000ebcd656: 0x00 0x00 0x00 0x34 0x02 0x10 0x00 0x29
0xffff88000ebcd65e: 0x00 0x00 0x38 0xa4 0x00 0x00 0x38 0xab
0xffff88000ebcd666: 0x03 0x03 0x00 0x04 0x06 0x06 0xdb 0xc5
0xffff88000ebcd66e: 0x00 0x01 0x12 0x00 0x10 0x53 0x07 0x02
0xffff88000ebcd676: 0x06 0x00 0x0a 0x09 0xff 0x8e 0xf2 0xc5
0xffff88000ebcd67e: 0x57 0xe3 0xa4 0x24 0xd4 0x00 0x00 0x00
0xffff88000ebcd686: 0xa1 0x81 0x8e 0x01 0x70 0x2e 0x70 0x72
0xffff88000ebcd68e: 0x6f 0x63 0x65 0x64 0x75 0x72 0x65 0x43
0xffff88000ebcd696: 0x6c 0x61 0x73 0x73 0x02 0x00 0x00 0x00
0xffff88000ebcd69e: 0x00 0x00 0x02 0x18 0x1e 0x73 0x63 0x74
0xffff88000ebcd6a6: 0x70 0x2e 0x63 0x68 0x75 0x6e 0x6b 0x5f
0xffff88000ebcd6ae: 0x74 0x79 0x70 0x65 0x08 0x64 0x61 0x74
0xffff88000ebcd6b6: 0x61 0x12 0x6d 0x33 0x75 0x61 0x2e 0x74
0xffff88000ebcd6be: 0x79 0x70 0x01 0x00 0x00 0x00 0x00 0x00
0xffff88000ebcd6c6: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffff88000ebcd6ce: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffff88000ebcd6d6: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
@russagit I tried to reproduce this with the corrupted packet with no luck, however it seems that the driver is creating a malformed skbuff, causing that skb_pull assert failure..
Thanks a lot for the analysis. I think we can close this issue as not relevant to pf_ring.
It looks like a broken packet (?) may cause Kernel Panic with pf_ring. We are receiving a mirroring traffic from some switch. At the same time when that switch experienced some failure we got following panic.