zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

Grub 2.02-beta2.9-ZOL11-7aa9f6 fails to install #212

Open markdesouza opened 8 years ago

markdesouza commented 8 years ago

In Jessie (haven't tried Wheezy), Grub 2.02-beta2.9-ZOL11-7aa9f6 fails to install on machines where the root pool has more than a single drive. I have tested this single drive in the zpool (works), a mirrored pair (fails) & stripe pair of mirrors ie 4 drives (fails).

Single drive layout: sda1 -> ext2 sda2 -> zpool vdev

2 / 4 drive layout: sdX1 -> md0 -> ex2 sdX2 -> zpool vdev

From the console I see this:

Setting up grub-pc-bin (2.02-beta2.9-ZOL11-7aa9f6) ...
Setting up grub-pc (2.02-beta2.9-ZOL11-7aa9f6) ...

Creating config file /etc/default/grub with new version

Message from syslogd@Test-Node3 at Jun 27 17:38:13 ...
 kernel:[ 5086.502087] ------------[ cut here ]------------

Message from syslogd@Test-Node3 at Jun 27 17:38:13 ...
 kernel:[ 5086.502204] invalid opcode: 0000 [#1] SMP 

Message from syslogd@Test-Node3 at Jun 27 17:38:13 ...
 kernel:[ 5086.503913] Stack:

Message from syslogd@Test-Node3 at Jun 27 17:38:13 ...
 kernel:[ 5086.504229] Call Trace:

Message from syslogd@Test-Node3 at Jun 27 17:38:13 ...
 kernel:[ 5086.504791] Code: ff b8 01 00 00 00 eb 02 31 c0 5a 5b 5d 41 5c 41 5d c3 41 54 55 89 fd 53 48 8b 06 48 89 f3 a8 04 75 02 0f 0b 48 8b 06 a8 20 75 02 <0f> 0b 48 83 7e 38 00 75 02 0f 0b 48 8b 06 f6 c4 02 74 02 0f 0b 
device node not found
device node not found
device node not found
device node not found

Dmesg contains the following information:

[ 2167.358002] ------------[ cut here ]------------
[ 2167.358154] kernel BUG at /build/linux-yNyu62/linux-3.2.81/fs/buffer.c:2960!
[ 2167.358281] invalid opcode: 0000 [#1] SMP 
[ 2167.358570] CPU 1 
[ 2167.358662] Modules linked in: fuse ext2 raid10 xts gf128mul nls_utf8 nls_cp437 vfat fat ipmi_devintf ipmi_si ipmi_msghandler loop dm_crypt zfs(P) snd_pcm snd_page_alloc snd_timer snd sb_edac joydev evdev coretemp zunicode(P) acpi_cpufreq zavl(P) soundcore edac_core pcspkr zcommon(P) znvpair(P) spl(O) zlib_deflate iTCO_wdt iTCO_vendor_support shpchp dcdbas mperf processor button wmi acpi_power_meter thermal_sys ext4 crc16 jbd2 mbcache dm_mod usbhid hid raid1 md_mod usb_storage sg sr_mod cdrom sd_mod crc_t10dif crc32c_intel aesni_intel aes_x86_64 aes_generic cryptd ahci ehci_hcd libahci libata usbcore igb megaraid_sas i2c_algo_bit usb_common i2c_core scsi_mod dca [last unloaded: scsi_wait_scan]
[ 2167.365931] 
[ 2167.366046] Pid: 155, comm: sync_supers Tainted: P        W  O 3.2.0-4-amd64 #1 Debian 3.2.81-1 Dell Inc. PowerEdge R620/0PXXHP
[ 2167.366460] RIP: 0010:[<ffffffff8111ee57>]  [<ffffffff8111ee57>] submit_bh+0x19/0xff
[ 2167.366705] RSP: 0018:ffff8810389a3e10  EFLAGS: 00010246
[ 2167.366827] RAX: 0000000000000005 RBX: ffff880ffd824e98 RCX: ffff880ffd8248e8
[ 2167.366952] RDX: 0000000000000000 RSI: ffff880ffd824e98 RDI: 0000000000000211
[ 2167.367078] RBP: 0000000000000211 R08: 0000000000000000 R09: 0000000000000180
[ 2167.367204] R10: ffff881038ca2000 R11: ffff881038ca2000 R12: ffff880fd38c6400
[ 2167.367329] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 2167.367456] FS:  0000000000000000(0000) GS:ffff88203f200000(0000) knlGS:0000000000000000
[ 2167.367605] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2167.367727] CR2: 00007f17db995970 CR3: 0000000001605000 CR4: 00000000000406e0
[ 2167.367854] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2167.367979] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2167.368105] Process sync_supers (pid: 155, threadinfo ffff8810389a2000, task ffff881038825740)
[ 2167.368255] Stack:
[ 2167.368367]  ffff880ffd824e98 0000000000000211 ffff880fd38c6400 ffffffff811217c3
[ 2167.368866]  0000000000000000 ffffffffa046e6c2 ffff880fd9986000 ffff880fd9986000
[ 2167.369356]  ffff880ffd824e98 ffffffffa0471f7e ffff8810389a2000 ffff880fd9986000
[ 2167.369925] Call Trace:
[ 2167.370046]  [<ffffffff811217c3>] ? __sync_dirty_buffer+0x52/0x87
[ 2167.370177]  [<ffffffffa046e6c2>] ? ext2_count_free_inodes+0x19/0x40 [ext2]
[ 2167.370308]  [<ffffffffa0471f7e>] ? ext2_sync_super+0xae/0xba [ext2]
[ 2167.370435]  [<ffffffffa0471fd7>] ? ext2_sync_fs+0x4d/0x58 [ext2]
[ 2167.370566]  [<ffffffff810fe69e>] ? sync_supers+0x6c/0xb9
[ 2167.370692]  [<ffffffff810cc3d2>] ? bdi_sched_wait+0xa/0xa
[ 2167.370815]  [<ffffffff810cc411>] ? bdi_sync_supers+0x3f/0x50
[ 2167.370945]  [<ffffffff8105fb69>] ? kthread+0x76/0x7e
[ 2167.371074]  [<ffffffff8135ac34>] ? kernel_thread_helper+0x4/0x10
[ 2167.371200]  [<ffffffff8105faf3>] ? kthread_worker_fn+0x139/0x139
[ 2167.371325]  [<ffffffff8135ac30>] ? gs_change+0x13/0x13
[ 2167.371446] Code: ff b8 01 00 00 00 eb 02 31 c0 5a 5b 5d 41 5c 41 5d c3 41 54 55 89 fd 53 48 8b 06 48 89 f3 a8 04 75 02 0f 0b 48 8b 06 a8 20 75 02 <0f> 0b 48 83 7e 38 00 75 02 0f 0b 48 8b 06 f6 c4 02 74 02 0f 0b 
[ 2167.377220] RIP  [<ffffffff8111ee57>] submit_bh+0x19/0xff
[ 2167.377432]  RSP <ffff8810389a3e10>
[ 2167.377607] ---[ end trace f207cc71baf06877 ]---

The above may indicate the issue is with mdraid and grub, but given this does seem to present on my zfs machines, I thought I'd raise it here

markdesouza commented 8 years ago

I should mention that I am installing this from a Wheezy chroot environment, hence why the kernel is: 3.2.0-4-amd64 #1 SMP Debian 3.2.81-1 x86_64 GNU/Linux

markdesouza commented 8 years ago

Confirmed this problem only occurs when using ZFS. If I just use MD raid and hence use the standard version of grub then the problem does not occur.

This indicates that the problem is either with ZFS module or ZFS version of grub.

markdesouza commented 8 years ago

This problem seems to occur when installing Grub 2.02-beta2.9-ZOL11 on a new Jessie install located inside a Wheezy chroot.

Installing Grub 2.02-beta2.9-ZOL11 on a new Wheezy install located inside a Wheezy chroot, works.

I assume this has to do with Grub getting confused between the different kernel versions.