openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

Illumos 3875 panic after failed rollback #1609

Closed edillmann closed 10 years ago

edillmann commented 11 years ago

Hi,

I get several kernel oops while running a zpool scrub. The system is responsive.

Regards, Eric

[171052.370029] init: zabbix-agent main process ended, respawning
[171065.689697] BUG: unable to handle kernel NULL pointer dereference at           (null)
[171065.689985] IP: [<ffffffffa01cc8b6>] dmu_objset_space+0x6/0x20 [zfs]
[171065.690175] PGD 323c59067 PUD 15c264067 PMD 0 
[171065.690414] Oops: 0000 [#802] SMP 
[171065.690601] Modules linked in: ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) veth(F) lru_cache(F) libcrc32c(F) ipmi_devintf(F) xt_state(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ipt_MASQUERADE(F) iptable_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F) x_tables(F) parport_pc(F) ppdev(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) lockd(F) sunrpc(F) fscache(F) dm_crypt(F) bridge(F) stp(F) llc(F) gpio_ich(F) adm1021(F) i2c_i801(F) dm_multipath(F) scsi_dh(F) microcode(F) coretemp(F) joydev(F) lpc_ich(F) ioatdma(F) i7core_edac(F) edac_core(F) dca(F) ipmi_si(F) ipmi_msghandler(F) lp(F) parport(F) kvm_intel(F) kvm(F) ext2(F) zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) zlib_deflate(F) raid10(F) raid456(F) async_memcpy(F) async_raid6_recov(F) async_pq(F) async_xor(F) xor(F) async_tx(F) raid6_pq(F) raid0(F) multipath(F) linear(F) hid_generic(F) usbhid(F) hid(F) raid1(F) ahci(F) libahci(F) e1000e(F) ptp(F) pps_core(F)
[171065.695639] CPU 1 
[171065.695700] Pid: 6092, comm: zabbix_agentd Tainted: PF     D    O 3.9.2-lxc2 #2 Intel Corporation S5500BC/S5500BC
[171065.695974] RIP: 0010:[<ffffffffa01cc8b6>]  [<ffffffffa01cc8b6>] dmu_objset_space+0x6/0x20 [zfs]
[171065.696181] RSP: 0018:ffff880365411e10  EFLAGS: 00010246
[171065.696276] RAX: 0000000000000000 RBX: ffff880365411ef8 RCX: ffff880365411e30
[171065.696453] RDX: ffff880365411e28 RSI: ffff880365411e20 RDI: 0000000000000000
[171065.696632] RBP: ffff880365411e58 R08: ffff880365411e38 R09: ffffffff811a18a2
[171065.696758] R10: 0000000000000000 R11: ffff9c96939d8a8f R12: ffff880856560000
[171065.696884] R13: ffff880856560378 R14: ffff880365411ef8 R15: 00007fff55d3a8f0
[171065.697011] FS:  00007f559c99d740(0000) GS:ffff88086fc00000(0000) knlGS:0000000000000000
[171065.697140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[171065.697236] CR2: 0000000000000000 CR3: 0000000323c58000 CR4: 00000000000007e0
[171065.697362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[171065.697488] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[171065.697615] Process zabbix_agentd (pid: 6092, threadinfo ffff880365410000, task ffff8803c340dcc0)
[171065.697808] Stack:
[171065.697893]  ffff880365411e58 ffffffffa02441ad 0000000000000000 00007f559b60a5a4
[171065.698217]  ffff880365411f58 ffff8801450cc600 ffff880365411ef8 ffff880855aa4c00
[171065.698599]  00007fff55d39850 ffff880365411e68 ffffffffa0261cee ffff880365411e88
[171065.698922] Call Trace:
[171065.699051]  [<ffffffffa02441ad>] ? zfs_statvfs+0x9d/0x170 [zfs]
[171065.699184]  [<ffffffffa0261cee>] zpl_statfs+0xe/0x20 [zfs]
[171065.699284]  [<ffffffff811c73c1>] statfs_by_dentry+0xa1/0x140
[171065.699382]  [<ffffffff811c747b>] vfs_statfs+0x1b/0xb0
[171065.699477]  [<ffffffff811c7556>] user_statfs+0x46/0x90
[171065.699573]  [<ffffffff811c762a>] sys_statfs+0x1a/0x40
[171065.699709]  [<ffffffff816c8f5d>] system_call_fastpath+0x1a/0x1f
[171065.699805] Code: 00 00 00 00 00 66 66 66 66 90 55 48 8b 3f 48 89 e5 e8 6f 08 01 00 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 <48> 8b 3f 48 89 e5 e8 6f 12 01 00 5d c3 66 66 66 66 2e 0f 1f 84 
[171065.703240] RIP  [<ffffffffa01cc8b6>] dmu_objset_space+0x6/0x20 [zfs]
[171065.703410]  RSP <ffff880365411e10>
[171065.703498] CR2: 0000000000000000
edillmann commented 11 years ago

The scrub has ended, but the oopses are still there :-(

dweeezil commented 11 years ago

@edillmann It would be interesting to know what argument is being passed to statfs by zabbix_agent and how it relates to your ZFS configuration. A cursory glance at its source code make me think the argument corresponds fairly directly to your zabbix configuration. It would seem you ought to be able to duplicate this problem simply by running df on the same argument.

edillmann commented 11 years ago

Doing a strace on df -h permit to identify a dataset which is a target to regular zfs receive. This dataset was mounted (which was wrong). I did umount the dataset and the problem disapeared.

behlendorf commented 11 years ago

@edillmann Still you shouldn't have been able to cause a BUG. Can you clearly describe the incorrect configuration which was able to cause this problem?

edillmann commented 11 years ago

@behlendorf the BUG appears in the following situation

behlendorf commented 11 years ago

@edillmann In zfs_statvfs() the variable zsb->z_os is NULL because we're doing an online receive and encountered an error during rollback. That's causing your crash. For the moment don't do that.

It appears the Illumos folks just fixed a variant of this exact bug under issue https://www.illumos.org/issues/3875. We'll want to port and verify this fix illumos/illumos-gate@91948b51.

ryao commented 11 years ago

zfsonlinux/zfs#1775 includes Illumos 3875.

behlendorf commented 10 years ago

The illumos fix in https://www.illumos.org/issues/3875 has been merged as commit 831baf06efb3023ddee7ed41800d3b44521bf2ee. That is expected to resolve this issue.

edillmann commented 10 years ago

thank's a lot, i confirm that the issue is resolved :-)