neilbrown / gnubee-tools

Tools for building gnubee firmware - and maybe more.
Other
31 stars 14 forks source link

Boot hangs after power cycle #19

Closed jjakob closed 4 years ago

jjakob commented 4 years ago

Boot hangs and requires another hard power cycle:

U-Boot 1.1.3 (Aug  7 2017 - 16:23:41)

Board: MediaTek APSoC DRAM: 512 MB

Config XHCI 40M PLL
MediaTek SPI flash driver, SPI clock: 45MHz
spi device id: 1 2 19 4d
find flash: S25FL256S
============================================
MediaTek U-Boot Version: 5.0.1.0-6
--------------------------------------------
ASIC MT7621A DualCore (MAC to MT7530 Mode)
DRAM_CONF_FROM: Auto-Detection
DRAM_TYPE: DDR3
DRAM bus: 16 bit
Xtal Mode=3 OCP Ratio=1/3
Flash component: SPI Flash
Date:Aug  7 2017  Time:16:23:41
============================================
icache: sets:256, ways:4, linesz:32, total:32768
dcache: sets:256, ways:4, linesz:32, total:32768

 #### The CPU freq = 900 MHZ ####
 estimate memory size = 512 Mbytes

 Reset MT7530
set LAN/WAN WLLLL
(Re)start USB...
USB0:   mtk-xhci: init hccr be1c0000 and hcor be1c0020 hc_length 32
Register 300010f NbrPorts 3
Starting the controller
USB XHCI 0.96
scanning bus 0 for devices... 2 USB Device(s) found
       scanning bus for storage devices... 0 Storage Device(s) found

 No USB Storage found. Upgrade FW failed!

Please choose the operation:
   1: Load system code to SDRAM via TFTP.
   2: Load system code then write to Flash via TFTP.
   3: Boot system code via Flash (default).
   4: Enter boot command line interface.
   5: Load system code then write to Flash via USB Storage.
   6: Load system code then write to Flash via Httpd.
   9: Load U-Boot code then write to Flash via TFTP.                                                                                                                                                     0

3: System Boot system code via Flash.
## Checking image at bc050000 ...
   Image Name:   Linux-5.4.6+
   Image Type:   MIPS Linux Kernel Image (uncompressed)
   Data Size:    20723952 Bytes = 19.8 MB
   Load Address: 80001000
   Entry Point:  80686c20
   Verifying Checksum ... OK
OK
No initrd
## Transferring control to Linux (at address 80686c20) ...
## Giving linux memsize in MB, 512

Starting kernel ...

[    0.000000] Linux version 5.4.6+ (jernej@librem) (gcc version 8.3.0 (Debian 8.3.0-2)) #12 SMP Thu Jan 9 01:00:53 CET 2020
[    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[    0.000000] printk: bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[    0.000000] MIPS: machine is GB-PC2
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] VPE topology {2,2} total 4
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   HighMem  [mem 0x000000001c000000-0x0000000023ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x0000000023ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000023ffffff]
[    0.000000] percpu: Embedded 14 pages/cpu s26864 r8192 d22288 u57344
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130176
[    0.000000] Kernel command line: console=ttyS0,57600 rootfstype=squashfs,jffs2
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Writing ErrCtl register=0000d002
[    0.000000] Readback ErrCtl register=0000d002
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 497448K/524288K available (6711K kernel code, 241K rwdata, 1284K rodata, 13064K init, 241K bss, 26840K reserved, 0K cma-reserved, 65536K highmem)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] random: get_random_bytes called from start_kernel+0x344/0x54c with crng_init=0
[    0.000000] CPU Clock: 900MHz
[    0.000000] clocksource: GIC: mask: 0xffffffffffffffff max_cycles: 0xcf914c9718, max_idle_ns: 440795231327 ns
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 4247245437 ns
[    0.000009] sched_clock: 32 bits at 450MHz, resolution 2ns, wraps every 4772186110ns
[    0.015497] Calibrating delay loop... 597.60 BogoMIPS (lpj=2988032)
[    0.087779] pid_max: default: 32768 minimum: 301
[    0.097081] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.111486] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.126931] *** VALIDATE tmpfs ***
[    0.134323] *** VALIDATE proc ***
[    0.140987] *** VALIDATE cgroup1 ***
[    0.147933] *** VALIDATE cgroup2 ***
[    0.156554] rcu: Hierarchical SRCU implementation.
[    0.168689] smp: Bringing up secondary CPUs ...
[    0.179236] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.179247] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.179260] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.179343] CPU1 revision is: 0001992f (MIPS 1004Kc)
[    0.237938] Synchronize counters for CPU 1: done.
[    0.308496] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.308505] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.308514] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.308564] CPU2 revision is: 0001992f (MIPS 1004Kc)
[    0.358445] Synchronize counters for CPU 2: done.
[    0.419527] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.419536] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.419544] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.419596] CPU3 revision is: 0001992f (MIPS 1004Kc)
[    0.478004] Synchronize counters for CPU 3: done.
[    0.537620] smp: Brought up 1 node, 4 CPUs
[    0.546416] devtmpfs: initialized
[    0.556266] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.575738] futex hash table entries: 1024 (order: 3, 32768 bytes, linear)
[    0.589698] pinctrl core: initialized pinctrl subsystem
[    0.601477] NET: Registered protocol family 16
[    0.661901] random: fast init done
[    0.671305] SCSI subsystem initialized
[    0.682592] clocksource: Switched to clocksource GIC
[    0.692445] *** VALIDATE bpf ***
[    0.699041] VFS: Disk quotas dquot_6.6.0
[    0.706818] VFS: Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    0.720579] *** VALIDATE ramfs ***
[    0.729287] NET: Registered protocol family 2
[    0.738638] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[    0.755167] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.770564] TCP bind hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.784750] TCP: Hash tables configured (established 4096 bind 4096)
[    0.797460] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.810347] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.824455] NET: Registered protocol family 1
[    0.833036] PCI: CLS 0 bytes, default 32
[   16.207026] Initialise system trusted keyrings
[   16.216033] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[   16.240180] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[   16.251672] jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
[   16.266249] *** VALIDATE jffs2 ***
[   16.273046] SGI XFS with ACLs, security attributes, no debug enabled
[   16.288281] Key type asymmetric registered
[   16.296316] Asymmetric key parser 'x509' registered
[   16.306041] bounce: pool size: 64 pages
[   16.313603] io scheduler mq-deadline registered
[   16.322574] io scheduler kyber registered
[   16.332186] mt7621_gpio 1e000600.gpio: registering 32 gpios
[   16.343638] mt7621_gpio 1e000600.gpio: registering 32 gpios
[   16.355169] mt7621_gpio 1e000600.gpio: registering 32 gpios
[   16.367342] Serial: 8250/16550 driver, 3 ports, IRQ sharing disabled
[   16.381734] printk: console [ttyS0] disabled
[   16.390186] 1e000c00.uartlite: ttyS0 at MMIO 0x1e000c00 (irq = 18, base_baud = 3125000) is a 16550A
[   16.408125] printk: console [ttyS0] enabled
[   16.408125] printk: console [ttyS0] enabled
[   16.424655] printk: bootconsole [early0] disabled
[   16.424655] printk: bootconsole [early0] disabled
[   16.460461] loop: module loaded
[   16.468225] spi-mt7621 1e000b00.spi: sys_freq: 225000000
[   16.479814] spi-nor spi0.0: s25fl256s1 (32768 Kbytes)
[   16.489957] 4 fixed-partitions partitions found on MTD device spi0.0
[   16.502608] Creating 4 MTD partitions on "spi0.0":
[   16.512136] 0x000000000000-0x000000030000 : "u-boot"
[   16.524464] 0x000000030000-0x000000040000 : "u-boot-env"
[   16.537470] 0x000000040000-0x000000050000 : "factory"
[   16.549976] 0x000000050000-0x000002000000 : "firmware"
[   16.563695] libphy: Fixed MDIO Bus: probed
[   16.572060] tun: Universal TUN/TAP device driver, 1.6
[   16.606000] libphy: mdio: probed
[   16.613661] mt7530 mdio-bus:00: MT7530 adapts as multi-chip module
[   16.626182] mtk_soc_eth 1e100000.ethernet: generated random MAC address 96:d8:e8:1f:77:43
[   16.643338] mtk_soc_eth 1e100000.ethernet eth0: mediatek frame engine at 0xbe100000, irq 21
[   16.660059] mtk_soc_eth 1e100000.ethernet: generated random MAC address 8e:c9:46:62:a4:35
[   16.677128] mtk_soc_eth 1e100000.ethernet eth1: mediatek frame engine at 0xbe100000, irq 21
[   16.694932] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=1)
[   16.712272] sdhci: Secure Digital Host Controller Interface driver
[   16.724597] sdhci: Copyright(c) Pierre Ossman
[   16.734459] rt2880-pinmux pinctrl: sdhci is already enabled
[   16.745587] mtk-msdc 1e130000.sdhci: Error applying setting, reverse things back
[   16.760315] rt2880-pinmux pinctrl: sdhci is already enabled
[   16.771411] mtk-msdc 1e130000.sdhci: Error applying setting, reverse things back
[   16.811745] ledtrig-cpu: registered to indicate activity on CPUs
[   16.824142] mt7621-pci 1e140000.pcie: Parsing DT failed
[   16.835836] exFAT: Version 1.3.0
[   16.843855] NET: Registered protocol family 10
[   16.854312] Segment Routing with IPv6
[   16.861690] NET: Registered protocol family 17
[   16.870635] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   16.896687] 8021q: 802.1Q VLAN Support v1.8
[   16.905519] Loading compiled-in X.509 certificates
[   16.915579] Key type ._fscrypt registered
[   16.923580] Key type .fscrypt registered
[   16.934522] mt7530 mdio-bus:00: MT7530 adapts as multi-chip module
[   16.954339] libphy: dsa slave smi: probed
[   16.962893] mt7530 mdio-bus:00 ethblack (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY]
[   16.982417] mt7530 mdio-bus:00 ethblue (uninitialized): PHY [dsa-0.0:04] driver [Generic PHY]
[   17.000907] mt7530 mdio-bus:00: configuring for fixed/trgmii link mode
[   17.017096] mt7530 mdio-bus:00: Link is Up - 1Gbps/Full - flow control off
[   17.031700] DSA: tree 0 setup
[   17.037977] rt2880-pinmux pinctrl: pcie is already enabled
[   17.048921] mt7621-pci 1e140000.pcie: Error applying setting, reverse things back
[   18.182590] mt7621-pci-phy 1e149000.pcie-phy: Xtal is 40MHz
[   18.193708] mt7621-pci 1e140000.pcie: Port 0 N_FTS = 1b102800
[   18.322574] mt7621-pci-phy 1e149000.pcie-phy: Xtal is 40MHz
[   18.333680] mt7621-pci 1e140000.pcie: Port 1 N_FTS = 1b102800
[   18.462575] mt7621-pci-phy 1e14a000.pcie-phy: Xtal is 40MHz
[   18.475814] mt7621-pci 1e140000.pcie: Port 2 N_FTS = 1b102800
[   19.552572] mt7621-pci 1e140000.pcie: PCIE0 enabled
[   19.562266] mt7621-pci 1e140000.pcie: PCIE1 enabled
[   19.571976] mt7621-pci 1e140000.pcie: PCIE2 enabled
[   19.581719] mt7621-pci 1e140000.pcie: PCI coherence region base: 0x60000000, mask/settings: 0xf0000002
[   19.600450] mt7621-pci 1e140000.pcie: PCI host bridge to bus 0000:00
[   19.613127] pci_bus 0000:00: root bus resource [io  0xffffffff]
[   19.624919] pci_bus 0000:00: root bus resource [mem 0x60000000-0x6fffffff]
[   19.638609] pci_bus 0000:00: root bus resource [bus 00-ff]
[   19.649578] pci 0000:00:00.0: [0e8d:0801] type 01 class 0x0d8000
[   19.661574] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x01ffffff]
[   19.674058] pci 0000:00:00.0: reg 0x14: [mem 0x00000000-0x0000ffff]
[   19.686628] pci 0000:00:00.0: supports D1
[   19.694613] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[   19.706454] pci 0000:00:01.0: [0e8d:0801] type 01 class 0x0d8000
[   19.718451] pci 0000:00:01.0: reg 0x10: [mem 0x00000000-0x01ffffff]
[   19.730933] pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x0000ffff]
[   19.743485] pci 0000:00:01.0: supports D1
[   19.751452] pci 0000:00:01.0: PME# supported from D0 D1 D3hot
[   19.763256] pci 0000:00:02.0: [0e8d:0801] type 01 class 0x060400
[   19.775253] pci 0000:00:02.0: reg 0x10: [mem 0x00000000-0x7fffffff]
[   19.787737] pci 0000:00:02.0: reg 0x14: [mem 0x00000000-0x0000ffff]
[   19.800287] pci 0000:00:02.0: supports D1
[   19.808272] pci 0000:00:02.0: PME# supported from D0 D1 D3hot
[   19.820926] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   19.836876] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   19.852814] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
neilbrown commented 4 years ago

This is a known problem introduced when the PCI code was being cleaned up. I think it started failing about Linux Commit 745eeeac68d7 ("staging: mt7621-pci: factor out 'mt7621_pcie_enable_port' function") but when I was trying to dig deeper into it, I couldn't make it fail reliably, so I found find out exactly what caused the regression.

jjakob commented 4 years ago

I could repeat this 3 times in a row once. I think it's how I do the shutdowns: poweroff from system, then as it halts or starts rebooting in u-boot, turn off power switch, wait ~2-3s, power on again. As I said, last time I did this, I had to hard power off 3 times as it hung on the pci init each time.

Looking at the commit you referenced, https://github.com/neilbrown/linux/commit/745eeeac68d7 I see an obvious difference in order and timing between the old and new code. Does reverting that commit fix the issue? I might try if it fixes the issue on my end (I need to get some time to assemble a 2nd Gnubee as the one I tested on is now in use)

neilbrown commented 4 years ago

I believe this is now fixed, as of my 5.6.3 kernels (hopefully the fix will be in mainline soon). So I'll close this bug now. If you still have problems with 5.6.3 or later, please re-open.