openbmc / linux

OpenBMC Linux kernel source tree
Other
49 stars 132 forks source link

JFFS2 corruption that results in kernel panic #59

Closed nkskjames closed 8 years ago

nkskjames commented 8 years ago

While doing power cycling to debug another issue, the rwfs got corrupted and we had to netboot to recover: No files were created in overlay since first initial flash of this system.

Linux version 4.3.6-openbmc-20160222-1 (njames@bakerstreet) (gcc version 4.9.3 (GCC) ) #1 Tue Feb 23 09:45:18 CST 2016
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=0005317f
CPU: VIVT data cache, VIVT instruction cache
Machine model: Barrelye BMC
Memory policy: Data cache writeback
SOC Rev: 02010303
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Kernel command line: console=ttyS4,38400n8 root=/dev/ram rw
PID hash table entries: 1024 (order: 0, 4096 bytes)
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 252760K/262144K available (3634K kernel code, 174K rwdata, 1288K rodata, 180K init, 109K bss, 9384K reserved, 0K cma-reserved)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    vmalloc : 0xd0800000 - 0xff000000   ( 744 MB)
    lowmem  : 0xc0000000 - 0xd0000000   ( 256 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .text : 0xc0008000 - 0xc04d6de8   (4924 kB)
      .init : 0xc04d7000 - 0xc0504000   ( 180 kB)
      .data : 0xc0504000 - 0xc052faa8   ( 175 kB)
       .bss : 0xc052faa8 - 0xc054aebc   ( 110 kB)
NR_IRQS:16 nr_irqs:16 16
Aspeed VIC Initiallized
clocksource: moxart_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 39817925974 ns
sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every 21474836475000000ns
Calibrating delay loop... 190.87 BogoMIPS (lpj=954368)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
CPU: Testing write buffer coherency: ok
Setting up static identity map for 0x40008400 - 0x40008458
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
clocksource: Switched to clocksource moxart_timer
NET: Registered protocol family 2
TCP established hash table entries: 2048 (order: 1, 8192 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
UDP hash table entries: 256 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 1636K (c0800000 - c0999000)
futex hash table entries: 256 (order: -1, 3072 bytes)
squashfs: version 4.0 (2009/01/31) Phillip Lougher
jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
io scheduler noop registered (default)
Serial: 8250/16550 driver, 6 ports, IRQ sharing enabled
1e783000.serial: ttyS0 at MMIO 0x1e783000 (irq = 23, base_baud = 115200) is a 16550A
1e78d000.serial: ttyS1 at MMIO 0x1e78d000 (irq = 24, base_baud = 115200) is a 16550A
1e78e000.serial: ttyS2 at MMIO 0x1e78e000 (irq = 25, base_baud = 115200) is a 16550A
1e78f000.serial: ttyS3 at MMIO 0x1e78f000 (irq = 26, base_baud = 115200) is a 16550A
console [ttyS4] disabled
1e784000.serial: ttyS4 at MMIO 0x1e784000 (irq = 27, base_baud = 115200) is a 16550A
console [ttyS4] enabled
1e787000.vuart: ttyS5 at MMIO 0x1e787000 (irq = 18, base_baud = 115200) is a 16550A
brd: module loaded
loop: module loaded
bt-host 1e789140.ibt: Found bt host device
bt-host 1e789140.ibt: Using IRQ 18
platform 1e620000.fmc:flash@0: w25q256 (32768 Kbytes)
6 ofpart partitions found on MTD device bmc
Creating 6 MTD partitions on "bmc":
0x000000000000-0x000000060000 : "u-boot"
0x000000060000-0x000000080000 : "u-boot-env"
0x000000080000-0x000000300000 : "kernel"
0x000000300000-0x0000004c0000 : "initramfs"
0x0000004c0000-0x000001c00000 : "rofs"
0x000001c00000-0x000002000000 : "rwfs"
platform 1e630000.spi:flash: n25q512ax3 (65536 Kbytes)
ftgmac100 1e660000.ethernet: Using NCSI interface
ftgmac100 1e660000.ethernet: Read MAC address from chip 52:a6:f6:0e:6c:a6
ftmac100: Loading version 0.2 ...
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
usbcore: registered new interface driver cdc_acm
cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
usbcore: registered new interface driver usb-storage
usbcore: registered new interface driver usbserial
usbcore: registered new interface driver usbserial_generic
usbserial: USB Serial support registered for generic
mousedev: PS/2 mouse device common for all mice
i2c /dev entries driver
i2c_aspeed 1e78a000.i2c: i2c controller registered, irq 19
at24 0-0050: 32768 byte 24c256 EEPROM, writable, 64 bytes/write
rtc-ds1307 0-0068: rtc core: registered ds3231 as rtc0
i2c_aspeed i2c-0: i2c bus 0 registered, irq 220
i2c_aspeed i2c-1: i2c bus 1 registered, irq 221
i2c_aspeed i2c-2: i2c bus 2 registered, irq 222
i2c_aspeed i2c-3: i2c bus 3 registered, irq 223
i2c_aspeed i2c-4: i2c bus 4 registered, irq 224
i2c_aspeed i2c-5: i2c bus 5 registered, irq 225
at24 6-0051: 256 byte 24c02 EEPROM, writable, 8 bytes/write
at24 6-0055: 256 byte 24c02 EEPROM, writable, 8 bytes/write
i2c_aspeed i2c-6: i2c bus 6 registered, irq 226
i2c_aspeed i2c-7: i2c bus 7 registered, irq 227
i2c_aspeed i2c-8: i2c bus 8 registered, irq 228
lm75 0-004a: hwmon0: sensor 'lm75'
occ-i2c 3-0050: occ i2c driver ready: i2c addr@0x50
occ-i2c 3-0051: occ i2c driver ready: i2c addr@0x51
aspeed_wdt 1e785000.wdt: rate 48000000, max timeout 5, timeout 5
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
ip_tables: (C) 2000-2006 Netfilter Core Team
NET: Registered protocol family 17
rtc-ds1307 0-0068: setting system clock to 2000-08-20 03:12:47 UTC (966741167)
Freeing unused kernel memory: 180K (c04d7000 - c0504000)
rofs = mtd5 squashfs rwfs = mtd6 jffs2
jffs2: error: (486) jffs2_build_inode_pass1: child dir "lib" (ino #30) of dir ino #29 appears to be a hard link
jffs2: notice: (486) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found.
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
jffs2: warning: (1) jffs2_get_inode_nodes: Eep. No valid nodes for ino #30.
jffs2: warning: (1) jffs2_do_read_inode_internal: no data nodes found for ino #30
jffs2: iget() failed for ino #30
/sbin/init: error while loading shared libraries: libkmod.so.2: cannot open shared object file: Input/output error
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00

---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
shenki commented 8 years ago

This is userspace dying, not a kernel bug:

/sbin/init: error while loading shared libraries: libkmod.so.2: cannot open shared object file: Input/output error
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
nkskjames commented 8 years ago

userspace dying because of a corrupted shared library that was in rofs. How could that shared library get in rwfs? How could we debug that?

shenki commented 8 years ago

Your rofs is corrupt by the looks?

You could try booting with init=/bin/bash? But I suspect you will come across similar issues.

nkskjames commented 8 years ago

i'm sure the rwfs is corrupt. i just don't understand how the .so got affected. the rwfs should be empty or mostly empty.

gwshan commented 8 years ago

I guess it would have same root cause to issue 53.

mdmillerii commented 8 years ago

After a new report I searched for the error and found several reports, but also a proposed patch. That patch highlighted the issue and lead to a patch being merged in 4.5.

The library isn't in rwfs, but the directory is because it was modified. The bug is reading the directories when mounting the file system gets confused by stale copies still in the flash and marks child directories as having unallowable hard links.

See https://lists.ozlabs.org/pipermail/openbmc/2016-March/002340.html for this and a few other patchs including a deadlock fix marked for stable to be picked up.

shenki commented 8 years ago

I think we have this under control as of https://github.com/openbmc/linux/commit/aeb4718beca0d07ff232341cdd544008e17f1fdc. Closing for now, please re-open if you see this again.