google / novm

Experimental KVM-based VMM for containers, written in Go.
Apache License 2.0
1.68k stars 123 forks source link

Kernel crash (in ext4?) #27

Closed pwaller closed 9 years ago

pwaller commented 9 years ago

Please let me know if you want to receive this kind of report or not.

novm 0.0-187.g62366c6, i7-3770, Debian 8 x64, go 1.4, python 2.7.8, linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08).

I experienced a kernel crash after leaving this running idle for some time (>45 minutes):

$ novm create --name test --nofork --terminal --com1

The command froze, became a zombie and had a child process stuck in an uninterruptable sleep. I am unsure if the kernel crash is cause or effect of a novm problem.

[435628.006811] ------------[ cut here ]------------
[435628.007149] kernel BUG at /build/linux-CMiYW9/linux-3.16.7-ckt2/fs/ext4/inode.c:1842!
[435628.007507] invalid opcode: 0000 [#1] SMP 
[435628.007872] Modules linked in: iTCO_wdt iTCO_vendor_support eeepc_wmi asus_wmi sparse_keymap rfkill ppdev x86_pkg_temp_thermal evdev intel_powerclamp intel_rapl i915 tpm_infineon tpm_tis tpm mei_me drm_kms_helper drm mei parport_pc coretemp lpc_ich i2c_algo_bit parport shpchp battery mfd_core video processor kvm_intel kvm i2c_i801 serio_raw pcspkr i2c_core wmi button autofs4 ext4 crc16 mbcache jbd2 btrfs algif_skcipher af_alg dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid10 xor raid1 raid6_pq dm_crypt dm_mod md_mod sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw ahci gf128mul glue_helper ablk_helper libahci cryptd libata ehci_pci xhci_hcd ehci_hcd scsi_mod r8169 mii usbcore
[435628.011624]  usb_common thermal fan thermal_sys
[435628.012361] CPU: 3 PID: 58 Comm: kswapd0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt2-1
[435628.013127] Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 9002 05/30/2014
[435628.013927] task: ffff88040f382050 ti: ffff88040b88c000 task.ti: ffff88040b88c000
[435628.014689] RIP: 0010:[<ffffffffa039fb37>]  [<ffffffffa039fb37>] ext4_writepage+0x407/0x450 [ext4]
[435628.015494] RSP: 0018:ffff88040b88fa10  EFLAGS: 00010246
[435628.016405] RAX: 02ffff8000040009 RBX: 0000000000001000 RCX: 0000000000000020
[435628.017379] RDX: 0000000000040000 RSI: ffff88040b88fb18 RDI: ffffea000de16930
[435628.018327] RBP: ffffea000de16930 R08: ffff880370795600 R09: 0000000000000000
[435628.019317] R10: 0000000000016258 R11: ffff88041fde8e00 R12: ffff8803707954b0
[435628.020337] R13: ffff88040b88fb18 R14: ffff88040a8671a0 R15: ffff88040b88faf8
[435628.021361] FS:  0000000000000000(0000) GS:ffff88041fac0000(0000) knlGS:0000000000000000
[435628.022321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[435628.023382] CR2: 00007f7823f24000 CR3: 0000000001813000 CR4: 00000000001427e0
[435628.024553] Stack:
[435628.025638]  ffff88040b88fa0c ffffffff81170c70 0000000000000000 0000000000000000
[435628.026860]  0000000000000000 ffff88040b88fdc0 ffffea000de16950 ffff88040b88fbe8
[435628.027934]  ffffea000de16930 ffff88040a8671a0 ffff88040b88faf8 ffffffff8114d489
[435628.029027] Call Trace:
[435628.030114]  [<ffffffff81170c70>] ? page_referenced_one+0x160/0x160
[435628.031233]  [<ffffffff8114d489>] ? shrink_page_list+0x7d9/0xa60
[435628.032363]  [<ffffffff8114dd42>] ? shrink_inactive_list+0x192/0x500
[435628.033502]  [<ffffffff8114e951>] ? shrink_lruvec+0x511/0x6a0
[435628.034651]  [<ffffffff8114eb54>] ? shrink_zone+0x74/0x1b0
[435628.035798]  [<ffffffff8114fc6e>] ? balance_pgdat+0x38e/0x5c0
[435628.036954]  [<ffffffff8114ffff>] ? kswapd+0x15f/0x400
[435628.038118]  [<ffffffff810a5940>] ? prepare_to_wait_event+0xf0/0xf0
[435628.039324]  [<ffffffff8114fea0>] ? balance_pgdat+0x5c0/0x5c0
[435628.040513]  [<ffffffff81085f1d>] ? kthread+0xbd/0xe0
[435628.041703]  [<ffffffff81085e60>] ? kthread_create_on_node+0x180/0x180
[435628.042929]  [<ffffffff8150d27c>] ? ret_from_fork+0x7c/0xb0
[435628.044147]  [<ffffffff81085e60>] ? kthread_create_on_node+0x180/0x180
[435628.045380] Code: ff e9 66 ff ff ff 45 31 ff e9 72 ff ff ff 48 89 ee 4c 89 ef e8 cb 5b da e0 48 89 ef e8 63 b0 d9 e0 b8 f4 ff ff ff e9 d2 fc ff ff <0f> 0b 0f 0b 0f 0b 41 89 c7 e9 3a ff ff ff 80 3d 08 a9 05 00 00 
[435628.048172] RIP  [<ffffffffa039fb37>] ext4_writepage+0x407/0x450 [ext4]
[435628.049550]  RSP <ffff88040b88fa10>
[435628.055612] ---[ end trace c943039a6a6ac28b ]---
pwaller commented 9 years ago

For anyone following by email, I forgot to mention the novm version, it's 0.0-187.g62366c6. (I've also edited the report above).

amscanne commented 9 years ago

Thanks for the report!

To confirm: is this a host crash or a guest crash? Looks like a host crash (all those crazy modules).

It doesn't look like anything related to the KVM module (but it might be). In any case, some random userspace program shouldn't be capable of crashing the host kernel like this. Looks like you've hit a kernel bug. You could try submitting this to upstream Linux, but if you've got third-party drivers they may not be sympathetic (I'm not sure about iTCO_*, asus_wmi, etc.).

pwaller commented 9 years ago

It's a host crash. Closing for now. I'll do some more investigating. At least it's here in case someone happens to search for it.