Open GeorgeSapkin opened 5 years ago
In order to exclude a possible hardware issue with your custom carrier board, it would be helpful to reproduce this issue on a Raspberry Pi board.
I cannot reproduce it with the official I/O board since it doesn't have Ethernet. And reproducing this on a full Pi doesn't seem relevant (different hardware). What else can I do to debug this?
What's the difference between your custom IO board and a Raspberry Pi 3 regarding SMSC95xx wiring? How about providing your device tree source?
I too am seeing this on a CM3 with an SMSC95xx. I too have looked at the other bug reports and was not able to find a solution to this issue.
I am confused about what in the device tree source you are looking for?
Jul 22 16:21:58 RevPi11734 kernel: [ 0.000000] Linux version 4.9.76-rt60-v7+ (admin@kunbus.de) (gcc version 4.8.3 20140303 (prerelease) (crosstool-NG linaro-1.13.1+bzr2650 - Linaro GCC 2014.03) ) #1 SMP PREEMPT RT Tue, 17 Jul 2018 14:20:12 +0200 ... Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344611] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344620] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344626] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344632] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR ... Jul 18 23:19:11 RevPi11734 kernel: [43053.607537] smsc95xx 1-1.5.1:1.0 eth1: Failed to read reg index 0x00000114: -110 Jul 18 23:19:11 RevPi11734 kernel: [43053.607547] smsc95xx 1-1.5.1:1.0 eth1: Error reading MII_ACCESS Jul 18 23:19:11 RevPi11734 kernel: [43053.607553] smsc95xx 1-1.5.1:1.0 eth1: MII is busy in smsc95xx_mdio_read Jul 18 23:19:11 RevPi11734 kernel: [43053.607559] smsc95xx 1-1.5.1:1.0 eth1: Failed to read MII_BMSR
-110 is a timeout, but I don't think I can help much more than that. Raspberry Pi boards don't exhibit this problem, and the rare and random nature of the errors makes it look much more like a hardware issue than software (unless it is a very subtle configuration error).
I found a quick test for this issue using iperf. I have a custom board PI CM3L with dual ethernet. I can use iperf on one as a master and one as a slave. See here: https://serverfault.com/questions/127636/force-local-ip-traffic-to-an-external-interface/861465#861465
Using this the kernel/driver from Rasbian with mods by Kunbus errors out very quickly (<15 mins) with the timeout error as described above, but the version I built with Yocto has been running for several days now with no errors.
The good Yocto version of the kernel is: Linux raspberrypi-cm3 4.19.57 #1 SMP Thu Jul 11 23:40:23 UTC 2019 armv7l armv7l armv7l GNU/Linux
The Rasbian/Kunbus version is: Linux RevPi8592 4.9.76-rt60-v7+ #1 SMP PREEMPT RT Tue, 17 Jul 2018 14:20:12 +0200 armv7l GNU/Linux
We've checked the wiring on our board. We don't have full schematic for RPi3B (AFAIK they are not open-sourced), so we're basing this on Pi1B 2.1 schematic. The difference is that nRESET pin on LAN9512 (we don't need 4 USB ports so we're not using LAN9514) is not wired to any GPIO pin and is pulled high instead. We haven't changed anything in the Devicetree regarding this pin (i.e. LAN_RUN) either. This seems in accordance with LAN9512 specs, Table 2-3. However, according to @pelwell this is an issue. Is this the case and are we misreading LAN9512 specs?
In order to answer your question, we need to know how do you provide the 25 MHz clock source (e.g. via GPIO of BCM2835).
XI and XO pins of LAN9512 are wired same as in RPiB 2.1 schematic to an external 25 MHz clock source. I'm going to try to connect an unused GPIO pin from CM3 to nRESET on LAN9512, configure it as LAN_RUN in DT and see if I can still reproduce the issue.
We've made a few test boards with nRESET connected to GPIO pin 13 and configured that pin as LAN_RUN
in dt-blob.bin
. The issue still occurs. Interestingly, when this happens, Wi-Fi also becomes unresponsive. Wi-Fi chip is connected over SDIO so it doesn't go through LAN9512.
We're currently investigating whether this issue could be related to crosstalk from LED traces.
@GeorgeSapkin I'm having this same issue, I'm building my self the Raspbian operating system with pi-gen with some custom configurations, my last build was by Oct 2, my entire setup is the RPi connected to a ST32 board (it has a serial connection and a storage drive inside it), for testing there is no sensors/gpio connections to the ST32 boards, and for production it has several others devices connected to it.
Testing case works ok, no issues, I am able to mount the storage into linux and all the applications that interact with it just works.
On the other side, production does not works, when I plug in the ST32 board to RPi it gives me this:
[ 163.934268] usb 1-1.2: new full-speed USB device number 5 using dwc_otg
[ 164.067904] usb 1-1.2: New USB device found, idVendor=0483, idProduct=374b, bcdDevice= 1.00
[ 164.067923] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 164.067932] usb 1-1.2: Product: STM32 STLink
[ 164.067942] usb 1-1.2: Manufacturer: STMicroelectronics
[ 164.067953] usb 1-1.2: SerialNumber: 066CFF383930434B43152720
[ 164.075944] usb-storage 1-1.2:1.1: USB Mass Storage device detected
[ 164.082453] scsi host0: usb-storage 1-1.2:1.1
[ 164.142860] usbcore: registered new interface driver uas
[ 164.163874] cdc_acm 1-1.2:1.2: ttyACM0: USB ACM device
[ 164.167005] usbcore: registered new interface driver cdc_acm
[ 164.167017] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[ 165.124578] ERROR::assign_and_init_hc:1408: assign_and_init_hc: Failed to allocate memory to handle non-dword aligned buffer case
[ 170.714285] cdc_acm 1-1.2:1.2: failed to set dtr/rts
[ 175.195608] scsi 0:0:0:0: Direct-Access MBED microcontroller 1.0 PQ: 0 ANSI: 2
[ 175.197179] sd 0:0:0:0: [sda] 4168 512-byte logical blocks: (2.13 MB/2.04 MiB)
[ 175.197653] sd 0:0:0:0: [sda] Write Protect is off
[ 175.197667] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 175.198153] sd 0:0:0:0: [sda] No Caching mode page found
[ 175.198164] sd 0:0:0:0: [sda] Assuming drive cache: write through
[ 175.234518] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 175.286560] sd 0:0:0:0: [sda] Attached SCSI removable disk
[ 200.074285] cdc_acm 1-1.2:1.2: failed to set dtr/rts
[ 205.114283] cdc_acm 1-1.2:1.2: failed to set dtr/rts
[ 212.074314] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
[ 212.074331] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[ 212.074341] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[ 212.074351] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
Two ST32 boards has the same firmware, RPi is the same, I tested different power sources (thinking it was that because prod ST32 board has other devices connected).
I have a previous RPi image version that is working on prod though, and it is almost one year old.
This is my report of the situation, I will see if I can rollback to another kernel version and see how it goes.
Looks like it might be related to https://github.com/raspberrypi/linux/issues/2924 We do have multiple Ethernet adapters: 1 from LAN9512 and 2 from the modem.
@GeorgeSapkin thanks for point me out that issue, my solution right now was to compile a 4.14.107 kernel and modules and that solved the issue I mentioned and another issue I had with the HDMI screen being yellow when the kernel starts. But for now this was my workaround.
@xyklex I'm glad to hear it solved your issue. We're stress-testing 4.14.x branch as well to see if it's an improvement.
@GeorgeSapkin could you tell me if you solved the problem with version 4.14.x?
@ckamas We are facing this or very similar issue on our kunbus revpi as well. Did you manage to find a solution?
Describe the bug Network connectivity drops out at random on CM3 with a custom carrier board with LAN9512. Device becomes completely unreachable over any network going through that chip (e.g. Ethernet, LTE).
To reproduce Unknown.
Expected behaviour Network connectivity doesn't randomly drop out.
Actual behaviour Network connectivity drops out.
System
cat /etc/rpi-issue
)? Raspberry Pi reference 2018-06-27vcgencmd version
)? VCHI initialization faileduname -a
)? 4.19.42-v7+ #1219 SMP Tue May 14 21:20:58 BST 2019 armv7l GNU/LinuxLogs
Additional context I've already checked issues #60, #149, #552 and #746 but none of them really contain anything useful besides mentioning that the issue is already fixed.
nRESET pin on LAN9512 is pulled high and not connected to a GPIO pin.