silicann / blickwerk-boot

linux kernel and u-boot sources for blickwerk sensors
0 stars 3 forks source link

Network connection failure with some direct connection peers #7

Open sumpfralle opened 5 years ago

sumpfralle commented 5 years ago

There seems to be a problem with the sensor when it is connected to a TP-Link 1043NDv1 device.

The router sees a link from the device (indicated by its LEDs per port).

But no packets from the device (e.g. DHCP lease requests) are visible on the network.

sumpfralle commented 5 years ago

Another device with network connection issues: Thinkpad X250 (Intel Corporation Ethernet Connection (3) I218-LM (rev 03)). Here the link status on the Thinkpad side stays down.

sumpfralle commented 5 years ago

As mentioned above the issue can be seen with a Thinkpad x250 (Intel Corporation Ethernet Connection (3) I218-LM (rev 03)). In this case the following details may be relevant.

Connection with a switch

The sensor's NIC shows the following status when connected with a switch (i.e. it works).

$ ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  100baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                             100baseT/Half 100baseT/Full 
        Link partner advertised pause frame use: Symmetric
        Link partner advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Link detected: yes

In this case success is indicated by a permanent UP state of the link.

Connection with a problematic peer

The sensor's NIC shows the following status when connected to a problematic peer:

$ ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  100baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                             100baseT/Half 100baseT/Full 
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Link detected: no

In this case failure is indicated by repeated up/down status changes of the NIC (visible via dmesg):

[ 8496.558372] fec 800f0000.ethernet eth0: Link is Down
[ 8497.558783] fec 800f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 8499.559514] fec 800f0000.ethernet eth0: Link is Down
[ 8500.558780] fec 800f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 8502.558370] fec 800f0000.ethernet eth0: Link is Down
[ 8503.558857] fec 800f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 8505.558337] fec 800f0000.ethernet eth0: Link is Down
[ 8506.558827] fec 800f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 8508.558349] fec 800f0000.ethernet eth0: Link is Down
[ 8509.558775] fec 800f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 9595.558383] fec 800f0000.ethernet eth0: Link is Down

After a while the situation may settle (resulting in a working network connection). The failure can be triggered again by running ethtool -r NIC (restart autonegotiation) on the peer or by disconnecting the cable for a moment.

In case of this failure situation, the peer's (Thinkpad x250) NIC shows the following status:

$ ethtool enp0s25 
Settings for enp0s25:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Speed: 100Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: on (forced)
    Supports Wake-on: pumbg
    Wake-on: g
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes

Attempted autonegotiation settings

The following settings on the sensor's NIC were tested:

sumpfralle commented 5 years ago

Just for the record: the problem can be reproduced with the following commands on the connected peer:

ip link set eth0 up
while sleep 1; do ip --oneline l show eth0; done

The failure situation is visible as a repeated toggling between BROADCAST,MULTICAST,UP and NO-CARRIER,BROADCAST,MULTICAST,UP.

After a while (minutes?) the situation may settle with the following settings on the peer's side:

Settings for eth1:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Speed: 100Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: off (auto)
    Supports Wake-on: pumbg
    Wake-on: g
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes

The difference seems to be the following:

sumpfralle commented 5 years ago

We noticed, that the problematic network interface (I218-LM (v3)) fails to complete autonegotiation when connected to itself (loopback: RX and TX are connected). Thus it seems that the chipset (or phy or something?) works outside of the specification and thus requires very tolerant peers.

sumpfralle commented 3 years ago

Another problematic peer NIC: Intel (6) I219-LM.

sumpfralle commented 3 years ago

A customer reported, that the autonegotiation works, if the problematic peer is configured as half duplex.

sumpfralle commented 3 years ago

For now we hide this problem by detecting failed negotiation attempts (listening to the kernel log) and restricting the announced speeds to 10 MBit/s:

ethtool -s eth0 advertise 0x3

(0xf is the default)