openzfs / spl

A shim layer which adds the core interfaces required for OpenZFS.
https://zfsonlinux.org/
GNU General Public License v2.0
282 stars 181 forks source link

SPL Panic - during zfs recv stream #365

Closed pongraczi closed 10 years ago

pongraczi commented 10 years ago

I upgraded my zfs/spl to 0.6.3 using a simple apt-get update/upgrade. It is a linuxmint 16 box. Every day this box receives a backup from a remote system via openvpn channel. Generally it is working without problem (in fact, I never experienced problems until this one). The last backup completed 31 send/recv processess, only 3 left, but it hung. The whole pool hung of course. Restart (hard reset) brings up the pool, so, it is working now. The system was responsive during the hung, but my home folder is on the pool, so, every tasks accessing to the pool hang, too. The sender is

The full dmesg message inserted to the end of this ticket.

Here is the ps ax | grep D output at the time of the problem happened: cat zfs_error-ps-D.log PID TTY STAT TIME COMMAND 1518 ? Ss 0:00 /usr/sbin/sshd -D 2165 ? D< 0:00 [txg_quiesce] 2497 ? Ss 0:00 nmbd -D 3712 pts/4 D+ 0:50 mc 13378 ? Ds 0:03 zfs receive -Fuv tank/backup/hn1/zfsstorage1/vz/private/202@zfs-auto-snap_daily-2014-06-16-0511 14872 ? D 0:54 /usr/bin/updatedb.mlocate 15328 pts/6 D+ 0:00 bash

Here is a dmesg message from SPL after restarting the system: [ 19.769902] SPLError: 784:0:(spl-err.c:67:vcmn_err()) WARNING: can't open objset for tank/backup/hn1/zfsstorage1/vz/private/202/%recv

dmesg: [h jún 16 05:54:22 2014] VERIFY3(sa.sa_magic == 0x2F505A) failed (2576474 == 3100762) [h jún 16 05:54:22 2014] SPLError: 13378:0:(zfs_vfsops.c:400:zfs_space_delta_cb()) SPL PANIC [h jún 16 05:54:22 2014] SPL: Showing stack for process 13378 [h jún 16 05:54:22 2014] CPU: 2 PID: 13378 Comm: zfs Tainted: PF W O 3.11.0-12-generic #19-Ubuntu [h jún 16 05:54:22 2014] Hardware name: CompuLab Intense-PC/Intense-PC, BIOS CR_2.2.0.400 X64 12/12/2013 [h jún 16 05:54:22 2014] ffff88018dc856d8 ffff8801a20c7960 ffffffff816e547a 0000000000000000 [h jún 16 05:54:22 2014] ffff8801a20c7970 ffffffffa01da507 ffff8801a20c7998 ffffffffa01db5a1 [h jún 16 05:54:22 2014] ffffffffa01efd21 ffffc90173a79ac0 ffffc90173a79ac0 ffff8801a20c79d8 [h jún 16 05:54:22 2014] Call Trace: [h jún 16 05:54:22 2014] [] dump_stack+0x45/0x56 [h jún 16 05:54:22 2014] [] spl_debug_dumpstack+0x27/0x40 [spl] [h jún 16 05:54:22 2014] [] spl_debug_bug+0x81/0xe0 [spl] [h jún 16 05:54:22 2014] [] zfs_space_delta_cb+0x186/0x190 [zfs] [h jún 16 05:54:22 2014] [] dmu_objset_userquota_get_ids+0x12e/0x3f0 [zfs] [h jún 16 05:54:22 2014] [] dnode_setdirty+0x3f/0x140 [zfs] [h jún 16 05:54:22 2014] [] dmu_object_set_checksum+0x36/0x50 [zfs] [h jún 16 05:54:22 2014] [] restore_object+0x116/0x2a0 [zfs] [h jún 16 05:54:22 2014] [] dmu_recv_stream+0x661/0xa80 [zfs] [h jún 16 05:54:22 2014] [] zfs_ioc_recv+0x212/0xc40 [zfs] [h jún 16 05:54:22 2014] [] ? zprop_iter_common+0xd9/0x130 [zcommon] [h jún 16 05:54:22 2014] [] ? zprop_string_to_index+0xb0/0xb0 [zcommon] [h jún 16 05:54:22 2014] [] ? zprop_name_to_prop+0x3c/0x60 [zcommon] [h jún 16 05:54:22 2014] [] ? kmem_free_debug+0x4b/0x150 [spl] [h jún 16 05:54:22 2014] [] ? rrw_exit+0x39/0x120 [zfs] [h jún 16 05:54:22 2014] [] ? kmalloc+0x55/0x180 [h jún 16 05:54:22 2014] [] ? strdup+0x48/0x90 [spl] [h jún 16 05:54:22 2014] [] zfsdev_ioctl+0x466/0x4e0 [zfs] [h jún 16 05:54:22 2014] [] do_vfs_ioctl+0x2e5/0x4d0 [h jún 16 05:54:22 2014] [] ? vfs_write+0x15c/0x1e0 [h jún 16 05:54:22 2014] [] SyS_ioctl+0x81/0xa0 [h jún 16 05:54:22 2014] [] ? SyS_write+0x49/0xa0 [h jún 16 05:54:22 2014] [] system_call_fastpath+0x1a/0x1f [h jún 16 05:58:11 2014] INFO: task txg_quiesce:2165 blocked for more than 120 seconds. [h jún 16 05:58:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 05:58:11 2014] txg_quiesce D ffff880440254580 0 2165 2 0x00000000 [h jún 16 05:58:11 2014] ffff8803f3fafd78 0000000000000046 ffff8803f3faffd8 0000000000014580 [h jún 16 05:58:11 2014] ffff8803f3faffd8 0000000000014580 ffff880407400000 ffffc90005cb24c8 [h jún 16 05:58:11 2014] ffffc90005cb2458 ffffc90005cb24d0 0000000000000000 0000000000000002 [h jún 16 05:58:11 2014] Call Trace: [h jún 16 05:58:11 2014] [] schedule+0x29/0x70 [h jún 16 05:58:11 2014] [] cv_wait_common+0xed/0x1a0 [spl] [h jún 16 05:58:11 2014] [] ? wake_up_atomic_t+0x30/0x30 [h jún 16 05:58:11 2014] [] cv_wait+0x15/0x20 [spl] [h jún 16 05:58:11 2014] [] txg_quiesce_thread+0x1fb/0x340 [zfs] [h jún 16 05:58:11 2014] [] ? txg_fini+0x2b0/0x2b0 [zfs] [h jún 16 05:58:11 2014] [] thread_generic_wrapper+0x7a/0x90 [spl] [h jún 16 05:58:11 2014] [] ? thread_exit+0xa0/0xa0 [spl] [h jún 16 05:58:11 2014] [] kthread+0xc0/0xd0 [h jún 16 05:58:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 05:58:11 2014] [] ret_from_fork+0x7c/0xb0 [h jún 16 05:58:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 05:58:11 2014] INFO: task zfs:13378 blocked for more than 120 seconds. [h jún 16 05:58:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 05:58:11 2014] zfs D ffff880440294580 0 13378 13348 0x00000000 [h jún 16 05:58:11 2014] ffff8801a20c7960 0000000000000046 ffff8801a20c7fd8 0000000000014580 [h jún 16 05:58:11 2014] ffff8801a20c7fd8 0000000000014580 ffff880374a79770 0000000000000000 [h jún 16 05:58:11 2014] ffff88018dc856d8 ffff88018dc856d0 0000000000000000 ffff88018dc853a0 [h jún 16 05:58:11 2014] Call Trace: [h jún 16 05:58:11 2014] [] schedule+0x29/0x70 [h jún 16 05:58:11 2014] [] spl_debug_bug+0xb5/0xe0 [spl] [h jún 16 05:58:11 2014] [] zfs_space_delta_cb+0x186/0x190 [zfs] [h jún 16 05:58:11 2014] [] dmu_objset_userquota_get_ids+0x12e/0x3f0 [zfs] [h jún 16 05:58:11 2014] [] dnode_setdirty+0x3f/0x140 [zfs] [h jún 16 05:58:11 2014] [] dmu_object_set_checksum+0x36/0x50 [zfs] [h jún 16 05:58:11 2014] [] restore_object+0x116/0x2a0 [zfs] [h jún 16 05:58:11 2014] [] dmu_recv_stream+0x661/0xa80 [zfs] [h jún 16 05:58:11 2014] [] zfs_ioc_recv+0x212/0xc40 [zfs] [h jún 16 05:58:11 2014] [] ? zprop_iter_common+0xd9/0x130 [zcommon] [h jún 16 05:58:11 2014] [] ? zprop_string_to_index+0xb0/0xb0 [zcommon] [h jún 16 05:58:11 2014] [] ? zprop_name_to_prop+0x3c/0x60 [zcommon] [h jún 16 05:58:11 2014] [] ? kmem_free_debug+0x4b/0x150 [spl] [h jún 16 05:58:11 2014] [] ? rrw_exit+0x39/0x120 [zfs] [h jún 16 05:58:11 2014] [] ? kmalloc+0x55/0x180 [h jún 16 05:58:11 2014] [] ? strdup+0x48/0x90 [spl] [h jún 16 05:58:11 2014] [] zfsdev_ioctl+0x466/0x4e0 [zfs] [h jún 16 05:58:11 2014] [] do_vfs_ioctl+0x2e5/0x4d0 [h jún 16 05:58:11 2014] [] ? vfs_write+0x15c/0x1e0 [h jún 16 05:58:11 2014] [] SyS_ioctl+0x81/0xa0 [h jún 16 05:58:11 2014] [] ? SyS_write+0x49/0xa0 [h jún 16 05:58:11 2014] [] system_call_fastpath+0x1a/0x1f [h jún 16 06:00:11 2014] INFO: task txg_quiesce:2165 blocked for more than 120 seconds. [h jún 16 06:00:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:00:11 2014] txg_quiesce D ffff880440254580 0 2165 2 0x00000000 [h jún 16 06:00:11 2014] ffff8803f3fafd78 0000000000000046 ffff8803f3faffd8 0000000000014580 [h jún 16 06:00:11 2014] ffff8803f3faffd8 0000000000014580 ffff880407400000 ffffc90005cb24c8 [h jún 16 06:00:11 2014] ffffc90005cb2458 ffffc90005cb24d0 0000000000000000 0000000000000002 [h jún 16 06:00:11 2014] Call Trace: [h jún 16 06:00:11 2014] [] schedule+0x29/0x70 [h jún 16 06:00:11 2014] [] cv_wait_common+0xed/0x1a0 [spl] [h jún 16 06:00:11 2014] [] ? wake_up_atomic_t+0x30/0x30 [h jún 16 06:00:11 2014] [] cv_wait+0x15/0x20 [spl] [h jún 16 06:00:11 2014] [] txg_quiesce_thread+0x1fb/0x340 [zfs] [h jún 16 06:00:11 2014] [] ? txg_fini+0x2b0/0x2b0 [zfs] [h jún 16 06:00:11 2014] [] thread_generic_wrapper+0x7a/0x90 [spl] [h jún 16 06:00:11 2014] [] ? __thread_exit+0xa0/0xa0 [spl] [h jún 16 06:00:11 2014] [] kthread+0xc0/0xd0 [h jún 16 06:00:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:00:11 2014] [] ret_from_fork+0x7c/0xb0 [h jún 16 06:00:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:00:11 2014] INFO: task zfs:13378 blocked for more than 120 seconds. [h jún 16 06:00:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:00:11 2014] zfs D ffff880440294580 0 13378 13348 0x00000000 [h jún 16 06:00:11 2014] ffff8801a20c7960 0000000000000046 ffff8801a20c7fd8 0000000000014580 [h jún 16 06:00:11 2014] ffff8801a20c7fd8 0000000000014580 ffff880374a79770 0000000000000000 [h jún 16 06:00:11 2014] ffff88018dc856d8 ffff88018dc856d0 0000000000000000 ffff88018dc853a0 [h jún 16 06:00:11 2014] Call Trace: [h jún 16 06:00:11 2014] [] schedule+0x29/0x70 [h jún 16 06:00:11 2014] [] spl_debug_bug+0xb5/0xe0 [spl] [h jún 16 06:00:11 2014] [] zfs_space_delta_cb+0x186/0x190 [zfs] [h jún 16 06:00:11 2014] [] dmu_objset_userquota_get_ids+0x12e/0x3f0 [zfs] [h jún 16 06:00:11 2014] [] dnode_setdirty+0x3f/0x140 [zfs] [h jún 16 06:00:11 2014] [] dmu_object_set_checksum+0x36/0x50 [zfs] [h jún 16 06:00:11 2014] [] restore_object+0x116/0x2a0 [zfs] [h jún 16 06:00:11 2014] [] dmu_recv_stream+0x661/0xa80 [zfs] [h jún 16 06:00:11 2014] [] zfs_ioc_recv+0x212/0xc40 [zfs] [h jún 16 06:00:11 2014] [] ? zprop_iter_common+0xd9/0x130 [zcommon] [h jún 16 06:00:11 2014] [] ? zprop_string_to_index+0xb0/0xb0 [zcommon] [h jún 16 06:00:11 2014] [] ? zprop_name_to_prop+0x3c/0x60 [zcommon] [h jún 16 06:00:11 2014] [] ? kmem_free_debug+0x4b/0x150 [spl] [h jún 16 06:00:11 2014] [] ? rrw_exit+0x39/0x120 [zfs] [h jún 16 06:00:11 2014] [] ? kmalloc+0x55/0x180 [h jún 16 06:00:11 2014] [] ? strdup+0x48/0x90 [spl] [h jún 16 06:00:11 2014] [] zfsdev_ioctl+0x466/0x4e0 [zfs] [h jún 16 06:00:11 2014] [] do_vfs_ioctl+0x2e5/0x4d0 [h jún 16 06:00:11 2014] [] ? vfs_write+0x15c/0x1e0 [h jún 16 06:00:11 2014] [] SyS_ioctl+0x81/0xa0 [h jún 16 06:00:11 2014] [] ? SyS_write+0x49/0xa0 [h jún 16 06:00:11 2014] [] system_call_fastpath+0x1a/0x1f [h jún 16 06:02:11 2014] INFO: task txg_quiesce:2165 blocked for more than 120 seconds. [h jún 16 06:02:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:02:11 2014] txg_quiesce D ffff880440254580 0 2165 2 0x00000000 [h jún 16 06:02:11 2014] ffff8803f3fafd78 0000000000000046 ffff8803f3faffd8 0000000000014580 [h jún 16 06:02:11 2014] ffff8803f3faffd8 0000000000014580 ffff880407400000 ffffc90005cb24c8 [h jún 16 06:02:11 2014] ffffc90005cb2458 ffffc90005cb24d0 0000000000000000 0000000000000002 [h jún 16 06:02:11 2014] Call Trace: [h jún 16 06:02:11 2014] [] schedule+0x29/0x70 [h jún 16 06:02:11 2014] [] cv_wait_common+0xed/0x1a0 [spl] [h jún 16 06:02:11 2014] [] ? wake_up_atomic_t+0x30/0x30 [h jún 16 06:02:11 2014] [] cv_wait+0x15/0x20 [spl] [h jún 16 06:02:11 2014] [] txg_quiesce_thread+0x1fb/0x340 [zfs] [h jún 16 06:02:11 2014] [] ? txg_fini+0x2b0/0x2b0 [zfs] [h jún 16 06:02:11 2014] [] thread_generic_wrapper+0x7a/0x90 [spl] [h jún 16 06:02:11 2014] [] ? thread_exit+0xa0/0xa0 [spl] [h jún 16 06:02:11 2014] [] kthread+0xc0/0xd0 [h jún 16 06:02:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:02:11 2014] [] ret_from_fork+0x7c/0xb0 [h jún 16 06:02:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:02:11 2014] INFO: task zfs:13378 blocked for more than 120 seconds. [h jún 16 06:02:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:02:11 2014] zfs D ffff880440294580 0 13378 13348 0x00000000 [h jún 16 06:02:11 2014] ffff8801a20c7960 0000000000000046 ffff8801a20c7fd8 0000000000014580 [h jún 16 06:02:11 2014] ffff8801a20c7fd8 0000000000014580 ffff880374a79770 0000000000000000 [h jún 16 06:02:11 2014] ffff88018dc856d8 ffff88018dc856d0 0000000000000000 ffff88018dc853a0 [h jún 16 06:02:11 2014] Call Trace: [h jún 16 06:02:11 2014] [] schedule+0x29/0x70 [h jún 16 06:02:11 2014] [] spl_debug_bug+0xb5/0xe0 [spl] [h jún 16 06:02:11 2014] [] zfs_space_delta_cb+0x186/0x190 [zfs] [h jún 16 06:02:11 2014] [] dmu_objset_userquota_get_ids+0x12e/0x3f0 [zfs] [h jún 16 06:02:11 2014] [] dnode_setdirty+0x3f/0x140 [zfs] [h jún 16 06:02:11 2014] [] dmu_object_set_checksum+0x36/0x50 [zfs] [h jún 16 06:02:11 2014] [] restore_object+0x116/0x2a0 [zfs] [h jún 16 06:02:11 2014] [] dmu_recv_stream+0x661/0xa80 [zfs] [h jún 16 06:02:11 2014] [] zfs_ioc_recv+0x212/0xc40 [zfs] [h jún 16 06:02:11 2014] [] ? zprop_iter_common+0xd9/0x130 [zcommon] [h jún 16 06:02:11 2014] [] ? zprop_string_to_index+0xb0/0xb0 [zcommon] [h jún 16 06:02:11 2014] [] ? zprop_name_to_prop+0x3c/0x60 [zcommon] [h jún 16 06:02:11 2014] [] ? kmem_free_debug+0x4b/0x150 [spl] [h jún 16 06:02:11 2014] [] ? rrw_exit+0x39/0x120 [zfs] [h jún 16 06:02:11 2014] [] ? kmalloc+0x55/0x180 [h jún 16 06:02:11 2014] [] ? strdup+0x48/0x90 [spl] [h jún 16 06:02:11 2014] [] zfsdev_ioctl+0x466/0x4e0 [zfs] [h jún 16 06:02:11 2014] [] do_vfs_ioctl+0x2e5/0x4d0 [h jún 16 06:02:11 2014] [] ? vfs_write+0x15c/0x1e0 [h jún 16 06:02:11 2014] [] SyS_ioctl+0x81/0xa0 [h jún 16 06:02:11 2014] [] ? SyS_write+0x49/0xa0 [h jún 16 06:02:11 2014] [] system_call_fastpath+0x1a/0x1f [h jún 16 06:04:11 2014] INFO: task txg_quiesce:2165 blocked for more than 120 seconds. [h jún 16 06:04:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:04:11 2014] txg_quiesce D ffff880440254580 0 2165 2 0x00000000 [h jún 16 06:04:11 2014] ffff8803f3fafd78 0000000000000046 ffff8803f3faffd8 0000000000014580 [h jún 16 06:04:11 2014] ffff8803f3faffd8 0000000000014580 ffff880407400000 ffffc90005cb24c8 [h jún 16 06:04:11 2014] ffffc90005cb2458 ffffc90005cb24d0 0000000000000000 0000000000000002 [h jún 16 06:04:11 2014] Call Trace: [h jún 16 06:04:11 2014] [] schedule+0x29/0x70 [h jún 16 06:04:11 2014] [] cv_wait_common+0xed/0x1a0 [spl] [h jún 16 06:04:11 2014] [] ? wake_up_atomic_t+0x30/0x30 [h jún 16 06:04:11 2014] [] cv_wait+0x15/0x20 [spl] [h jún 16 06:04:11 2014] [] txg_quiesce_thread+0x1fb/0x340 [zfs] [h jún 16 06:04:11 2014] [] ? txg_fini+0x2b0/0x2b0 [zfs] [h jún 16 06:04:11 2014] [] thread_generic_wrapper+0x7a/0x90 [spl] [h jún 16 06:04:11 2014] [] ? thread_exit+0xa0/0xa0 [spl] [h jún 16 06:04:11 2014] [] kthread+0xc0/0xd0 [h jún 16 06:04:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:04:11 2014] [] ret_from_fork+0x7c/0xb0 [h jún 16 06:04:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:04:11 2014] INFO: task zfs:13378 blocked for more than 120 seconds. [h jún 16 06:04:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:04:11 2014] zfs D ffff880440294580 0 13378 13348 0x00000000 [h jún 16 06:04:11 2014] ffff8801a20c7960 0000000000000046 ffff8801a20c7fd8 0000000000014580 [h jún 16 06:04:11 2014] ffff8801a20c7fd8 0000000000014580 ffff880374a79770 0000000000000000 [h jún 16 06:04:11 2014] ffff88018dc856d8 ffff88018dc856d0 0000000000000000 ffff88018dc853a0 [h jún 16 06:04:11 2014] Call Trace: [h jún 16 06:04:11 2014] [] schedule+0x29/0x70 [h jún 16 06:04:11 2014] [] spl_debug_bug+0xb5/0xe0 [spl] [h jún 16 06:04:11 2014] [] zfs_space_delta_cb+0x186/0x190 [zfs] [h jún 16 06:04:11 2014] [] dmu_objset_userquota_get_ids+0x12e/0x3f0 [zfs] [h jún 16 06:04:11 2014] [] dnode_setdirty+0x3f/0x140 [zfs] [h jún 16 06:04:11 2014] [] dmu_object_set_checksum+0x36/0x50 [zfs] [h jún 16 06:04:11 2014] [] restore_object+0x116/0x2a0 [zfs] [h jún 16 06:04:11 2014] [] dmu_recv_stream+0x661/0xa80 [zfs] [h jún 16 06:04:11 2014] [] zfs_ioc_recv+0x212/0xc40 [zfs] [h jún 16 06:04:11 2014] [] ? zprop_iter_common+0xd9/0x130 [zcommon] [h jún 16 06:04:11 2014] [] ? zprop_string_to_index+0xb0/0xb0 [zcommon] [h jún 16 06:04:11 2014] [] ? zprop_name_to_prop+0x3c/0x60 [zcommon] [h jún 16 06:04:11 2014] [] ? kmem_free_debug+0x4b/0x150 [spl] [h jún 16 06:04:11 2014] [] ? rrw_exit+0x39/0x120 [zfs] [h jún 16 06:04:11 2014] [] ? kmalloc+0x55/0x180 [h jún 16 06:04:11 2014] [] ? strdup+0x48/0x90 [spl] [h jún 16 06:04:11 2014] [] zfsdev_ioctl+0x466/0x4e0 [zfs] [h jún 16 06:04:11 2014] [] do_vfs_ioctl+0x2e5/0x4d0 [h jún 16 06:04:11 2014] [] ? vfs_write+0x15c/0x1e0 [h jún 16 06:04:11 2014] [] SyS_ioctl+0x81/0xa0 [h jún 16 06:04:11 2014] [] ? SyS_write+0x49/0xa0 [h jún 16 06:04:11 2014] [] system_call_fastpath+0x1a/0x1f [h jún 16 06:06:11 2014] INFO: task txg_quiesce:2165 blocked for more than 120 seconds. [h jún 16 06:06:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:06:11 2014] txg_quiesce D ffff880440254580 0 2165 2 0x00000000 [h jún 16 06:06:11 2014] ffff8803f3fafd78 0000000000000046 ffff8803f3faffd8 0000000000014580 [h jún 16 06:06:11 2014] ffff8803f3faffd8 0000000000014580 ffff880407400000 ffffc90005cb24c8 [h jún 16 06:06:11 2014] ffffc90005cb2458 ffffc90005cb24d0 0000000000000000 0000000000000002 [h jún 16 06:06:11 2014] Call Trace: [h jún 16 06:06:11 2014] [] schedule+0x29/0x70 [h jún 16 06:06:11 2014] [] cv_wait_common+0xed/0x1a0 [spl] [h jún 16 06:06:11 2014] [] ? wake_up_atomic_t+0x30/0x30 [h jún 16 06:06:11 2014] [] cv_wait+0x15/0x20 [spl] [h jún 16 06:06:11 2014] [] txg_quiesce_thread+0x1fb/0x340 [zfs] [h jún 16 06:06:11 2014] [] ? txg_fini+0x2b0/0x2b0 [zfs] [h jún 16 06:06:11 2014] [] thread_generic_wrapper+0x7a/0x90 [spl] [h jún 16 06:06:11 2014] [] ? __thread_exit+0xa0/0xa0 [spl] [h jún 16 06:06:11 2014] [] kthread+0xc0/0xd0 [h jún 16 06:06:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:06:11 2014] [] ret_from_fork+0x7c/0xb0 [h jún 16 06:06:11 2014] [] ? kthread_create_on_node+0x120/0x120 [h jún 16 06:06:11 2014] INFO: task zfs:13378 blocked for more than 120 seconds. [h jún 16 06:06:11 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [h jún 16 06:06:11 2014] zfs D ffff880440294580 0 13378 13348 0x00000000 [h jún 16 06:06:11 2014] ffff8801a20c7960 0000000000000046 ffff8801a20c7fd8 0000000000014580 [h jún 16 06:06:11 2014] ffff8801a20c7fd8 0000000000014580 ffff880374a79770 0000000000000000 [h jún 16 06:06:11 2014] ffff88018dc856d8 ffff88018dc856d0 0000000000000000 ffff88018dc853a0 [h jún 16 06:06:11 2014] Call Trace: [h jún 16 06:06:11 2014] [] schedule+0x29/0x70 [h jún 16 06:06:11 2014] [] spl_debug_bug+0xb5/0xe0 [spl] [h jún 16 06:06:11 2014] [] zfs_space_delta_cb+0x186/0x190 [zfs] [h jún 16 06:06:11 2014] [] dmu_objset_userquota_get_ids+0x12e/0x3f0 [zfs] [h jún 16 06:06:11 2014] [] dnode_setdirty+0x3f/0x140 [zfs] [h jún 16 06:06:11 2014] [] dmu_object_set_checksum+0x36/0x50 [zfs] [h jún 16 06:06:11 2014] [] restore_object+0x116/0x2a0 [zfs] [h jún 16 06:06:11 2014] [] dmu_recv_stream+0x661/0xa80 [zfs] [h jún 16 06:06:11 2014] [] zfs_ioc_recv+0x212/0xc40 [zfs] [h jún 16 06:06:11 2014] [] ? zprop_iter_common+0xd9/0x130 [zcommon] [h jún 16 06:06:11 2014] [] ? zprop_string_to_index+0xb0/0xb0 [zcommon] [h jún 16 06:06:11 2014] [] ? zprop_name_to_prop+0x3c/0x60 [zcommon] [h jún 16 06:06:11 2014] [] ? kmem_free_debug+0x4b/0x150 [spl] [h jún 16 06:06:11 2014] [] ? rrw_exit+0x39/0x120 [zfs] [h jún 16 06:06:11 2014] [] ? kmalloc+0x55/0x180 [h jún 16 06:06:11 2014] [] ? strdup+0x48/0x90 [spl] [h jún 16 06:06:11 2014] [] zfsdev_ioctl+0x466/0x4e0 [zfs] [h jún 16 06:06:11 2014] [] do_vfs_ioctl+0x2e5/0x4d0 [h jún 16 06:06:11 2014] [] ? vfs_write+0x15c/0x1e0 [h jún 16 06:06:11 2014] [] SyS_ioctl+0x81/0xa0 [h jún 16 06:06:11 2014] [] ? SyS_write+0x49/0xa0 [h jún 16 06:06:11 2014] [] system_call_fastpath+0x1a/0x1f [h jún 16 07:36:40 2014] usb 2-1.1.5.5: new low-speed USB device number 9 using ehci-pci [h jún 16 07:36:40 2014] usb 2-1.1.5.5: New USB device found, idVendor=093a, idProduct=2510 [h jún 16 07:36:40 2014] usb 2-1.1.5.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [h jún 16 07:36:40 2014] usb 2-1.1.5.5: Product: USB OPTICAL MOUSE [h jún 16 07:36:40 2014] usb 2-1.1.5.5: Manufacturer: PIXART [h jún 16 07:36:40 2014] input: PIXART USB OPTICAL MOUSE as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1.5/2-1.1.5.5/2-1.1.5.5:1.0/input/input12 [h jún 16 07:36:40 2014] hid-generic 0003:093A:2510.0004: input,hidraw0: USB HID v1.11 Mouse [PIXART USB OPTICAL MOUSE] on usb-0000:00:1d.0-1.1.5.5/input0 [h jún 16 07:36:40 2014] usb 2-1.1.5.7: new low-speed USB device number 10 using ehci-pci [h jún 16 07:36:41 2014] usb 2-1.1.5.7: New USB device found, idVendor=04d9, idProduct=1702 [h jún 16 07:36:41 2014] usb 2-1.1.5.7: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [h jún 16 07:36:41 2014] usb 2-1.1.5.7: Product: USB Keyboard [h jún 16 07:36:41 2014] usb 2-1.1.5.7: Manufacturer:
[h jún 16 07:36:41 2014] input: USB Keyboard as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1.5/2-1.1.5.7/2-1.1.5.7:1.0/input/input13 [h jún 16 07:36:41 2014] hid-generic 0003:04D9:1702.0005: input,hidraw1: USB HID v1.10 Keyboard [ USB Keyboard] on usb-0000:00:1d.0-1.1.5.7/input0 [h jún 16 07:36:41 2014] input: USB Keyboard as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1.5/2-1.1.5.7/2-1.1.5.7:1.1/input/input14 [h jún 16 07:36:41 2014] hid-generic 0003:04D9:1702.0006: input,hidraw2: USB HID v1.10 Device [ USB Keyboard] on usb-0000:00:1d.0-1.1.5.7/input1 [h jún 16 07:50:10 2014] VERIFY(BSWAP_32(sa_hdr_phys->sa_magic) == SA_MAGIC) failed [h jún 16 07:50:10 2014] SPLError: 14872:0:(sa.c:1304:sa_build_index()) SPL PANIC [h jún 16 07:50:10 2014] SPL: Showing stack for process 14872 [h jún 16 07:50:10 2014] CPU: 0 PID: 14872 Comm: updatedb.mlocat Tainted: PF W O 3.11.0-12-generic #19-Ubuntu [h jún 16 07:50:10 2014] Hardware name: CompuLab Intense-PC/Intense-PC, BIOS CR_2.2.0.400 X64 12/12/2013 [h jún 16 07:50:10 2014] 0000000000000001 ffff8801ef467790 ffffffff816e547a 0000000000000000 [h jún 16 07:50:10 2014] ffff8801ef4677a0 ffffffffa01da507 ffff8801ef4677c8 ffffffffa01db5a1 [h jún 16 07:50:10 2014] ffffffffa01efd21 ffff88039f76d180 ffff88007a8f7ec0 ffff8801ef467800 [h jún 16 07:50:10 2014] Call Trace: [h jún 16 07:50:10 2014] [] dump_stack+0x45/0x56 [h jún 16 07:50:10 2014] [] spl_debug_dumpstack+0x27/0x40 [spl] [h jún 16 07:50:10 2014] [] spl_debug_bug+0x81/0xe0 [spl] [h jún 16 07:50:10 2014] [] sa_build_index+0xf6/0x100 [zfs] [h jún 16 07:50:10 2014] [] sa_handle_get_from_db+0xc3/0x110 [zfs] [h jún 16 07:50:10 2014] [] zfs_znode_sa_init.isra.6+0x8f/0xc0 [zfs] [h jún 16 07:50:10 2014] [] zfs_znode_alloc+0xe8/0x570 [zfs] [h jún 16 07:50:10 2014] [] ? mutex_lock+0x12/0x2f [h jún 16 07:50:10 2014] [] ? zio_done+0x5e9/0xcc0 [zfs] [h jún 16 07:50:10 2014] [] ? zio_wait+0x12d/0x1c0 [zfs] [h jún 16 07:50:10 2014] [] ? dbuf_read+0x317/0x890 [zfs] [h jún 16 07:50:10 2014] [] ? mutex_lock+0x12/0x2f [h jún 16 07:50:10 2014] [] ? dnode_rele+0x44/0x80 [zfs] [h jún 16 07:50:10 2014] [] ? default_spin_lock_flags+0x9/0x10 [h jún 16 07:50:10 2014] [] ? dmu_object_info_from_dnode+0xdb/0xf0 [zfs] [h jún 16 07:50:10 2014] [] zfs_zget+0x181/0x1f0 [zfs] [h jún 16 07:50:10 2014] [] zfs_dirent_lock+0x434/0x570 [zfs] [h jún 16 07:50:10 2014] [] zfs_dirlook+0x81/0x370 [zfs] [h jún 16 07:50:10 2014] [] zfs_lookup+0x2d5/0x320 [zfs] [h jún 16 07:50:10 2014] [] zpl_lookup+0x6b/0xd0 [zfs] [h jún 16 07:50:10 2014] [] lookup_real+0x1d/0x50 [h jún 16 07:50:10 2014] [] __lookup_hash+0x33/0x40 [h jún 16 07:50:10 2014] [] lookup_slow+0x42/0xa7 [h jún 16 07:50:10 2014] [] path_lookupat+0x769/0x7e0 [h jún 16 07:50:10 2014] [] ? kmem_cache_alloc+0x31/0x120 [h jún 16 07:50:10 2014] [] ? getname_flags+0x51/0x190 [h jún 16 07:50:10 2014] [] filename_lookup+0x2b/0xd0 [h jún 16 07:50:10 2014] [] user_path_at_empty+0x54/0x90 [h jún 16 07:50:10 2014] [] ? lg_local_unlock+0x1a/0x20 [h jún 16 07:50:10 2014] [] ? cp_new_stat+0x10b/0x120 [h jún 16 07:50:10 2014] [] user_path_at+0x11/0x20 [h jún 16 07:50:10 2014] [] vfs_fstatat+0x50/0xa0 [h jún 16 07:50:10 2014] [] SYSC_newlstat+0x22/0x40 [h jún 16 07:50:10 2014] [] ? SyS_poll+0x65/0x100 [h jún 16 07:50:10 2014] [] SyS_newlstat+0xe/0x10 [h jún 16 07:50:10 2014] [] system_call_fastpath+0x1a/0x1f

pongraczi commented 10 years ago

Pool setup: zpool status -v pool: tank state: ONLINE scan: scrub repaired 0 in 4h43m with 0 errors on Fri May 23 13:26:44 2014 config:

NAME                                               STATE     READ WRITE CKSUM
tank                                               ONLINE       0     0     0
  mirror-0                                         ONLINE       0     0     0
    usb-ssk_U3_Hard_Drive_0000000000000094-0:0     ONLINE       0     0     0
    ata-WDC_WD5000BEVT-00A0RT0_WD-WXN1AB0X5490     ONLINE       0     0     0
  mirror-2                                         ONLINE       0     0     0
    ata-Hitachi_HTS545050A7E380_TEJ52139CA9VNS     ONLINE       0     0     0
    ata-Hitachi_HTS545050A7E380_TEJ52139CBYP0S     ONLINE       0     0     0
logs
  ata-KINGSTON_SMS200S330G_50026B723A061F4F-part3  ONLINE       0     0     0
cache
  ata-KINGSTON_SMS200S330G_50026B723A061F4F-part2  ONLINE       0     0     0

errors: No known data errors

Kernel: 3.11.0-12-generic

zpool history: please note, only the 202 is missing (failed during this VM backup) between 155 and 203.

2014-06-16.05:53:43 zfs receive -Fuv tank/backup/hn1/zfsstorage1/vz/private/154@zfs-auto-snap_daily-2014-06-16-0511 2014-06-16.05:53:50 zfs receive -Fuv tank/backup/hn1/zfsstorage1/vz/private/155@zfs-auto-snap_daily-2014-06-16-0511 2014-06-16.09:55:04 zfs receive -Fuv tank/backup/hn1/zfsstorage1/vz/private/203@zfs-auto-snap_daily-2014-06-16-0511 2014-06-16.09:55:15 zfs receive -Fuv tank/backup/hn1/zfsstorage1/vz/private/205@zfs-auto-snap_daily-2014-06-16-0511

dweeezil commented 10 years ago

@pongraczi Looks like a single-bit memory error when the SA was written (0x2F505A != 0x27505A). Has the pool always been written to on a system with ECC memory?

EDIT: This is a zfs issue and not an spl issue.

pongraczi commented 10 years ago

@dweeezil Thank you for your feedback! No, the target is a non-ecc system.

Sorry, first I planned to send this as a zfs issue, but the SPL PANIC mixed me. In general, I don't think an ordinary user can decide, where to send an issue :)

behlendorf commented 10 years ago

This is definitely a single bit error which could have been caused by a memory error. But since there's been another recent report of a very similar issue in zfsonlinux/zfs#2354 let's treat this as a bug until we're certain.

Let's move any further discussion to the ZFS issue and I'll close out this as a duplicate. I also agree that it's hard for a user to know exactly where to file an issue. We should do something to make that more obvious but as a general rule it's better to open issues in the ZFS tracker.