umask commented 9 years ago

I'm running server with ZoL to save backups on it.

Short details about my config:

CPUs:               1
Memory:             32GB
VM/Hypervisor:      openvz
ECC mem:            no
Distribution:       CentOS release 6.6 (Final)
Kernel version:     Linux backup3 2.6.32-042stab094.7 #1 SMP Wed Oct 22 12:43:21 MSK 2014 x86_64 x86_64 x86_64 GNU/Linux
SPL/ZFS source:     zfs-testing yum repo
SPL/ZFS version:    # cat /var/log/dmesg | grep -E 'SPL:|ZFS:'
                [    9.708040] SPL: Loaded module v0.6.3-52_g52479ec
                [    9.862372] ZFS: Loaded module v0.6.3-159_gc944be5, ZFS pool version 5000, ZFS filesystem version 5
                [   13.224374] SPL: using hostid 0x00000000
System services:    just one rsync on one openvz contrainer
Short  description: kernel panic during rsync 10TB of data

I have run stable version of ZoL which provided by this packages for centos 6:

libzfs2-0.6.3-1.1.el6.x86_64
zfs-dkms-0.6.3-1.1.el6.noarch
zfs-0.6.3-1.1.el6.x86_64
spl-dkms-0.6.3-1.1.el6.noarch
spl-0.6.3-1.1.el6.x86_64

No any problem happened while I make first backup using rsync. But after initial backup every time when I run rsync again kernel panic occurs.

I googled for the same problem https://github.com/zfsonlinux/zfs/issues/2701 and decided update zfs related packages using zfs-testing repo.

My current installed packages state is:

# rpm -qa | grep -E '(zfs|spl)'
zfs-dkms-0.6.3-159_gc944be5.el6.noarch
libzfs2-0.6.3-159_gc944be5.el6.x86_64
zfs-release-1-4.el6.noarch
spl-dkms-0.6.3-52_g52479ec.el6.noarch
spl-0.6.3-52_g52479ec.el6.x86_64
zfs-0.6.3-159_gc944be5.el6.x86_64

After reboot with new versions of zfs modules problem happened again.

There is backtrace for kernel panic with 0.6.3 from stable zfs repo:

[ 8859.284312] general protection fault: 0000 [#1] SMP 
[ 8859.284929] last sysfs file: /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq
[ 8859.285536] CPU 2 
[ 8859.285543] Modules linked in: netconsole configfs vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache auth_rpcgss nfs_acl sunrpc vziolimit vzdquota ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS xt_limit xt_dscp ipt_REJECT vzevent acpi_cpufreq freq_table mperf vznetdev vzmon vzdev ipv6 xt_recent xt_multiport xt_owner iptable_filter iptable_nat nf_nat ipt_LOG xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_mangle ip_tables iTCO_wdt iTCO_vendor_support serio_raw i2c_i801 e1000e ptp pps_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate lpc_ich mfd_core shpchp sg ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci xhci_hcd megaraid_sas i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: netconsole]
[ 8859.291848] 
[ 8859.292574] Pid: 3128, comm: rsync veid: 101 Tainted: P           ---------------    2.6.32-042stab094.7 #1 042stab094_7 System manufacturer System Product Name/P8H77-V
[ 8859.294100] RIP: 0010:[<ffffffffa027bfbc>]  [<ffffffffa027bfbc>] spl_kmem_cache_alloc+0x3c/0x1090 [spl]
[ 8859.294890] RSP: 0018:ffff88081ace1518  EFLAGS: 00010246
[ 8859.295670] RAX: 0000000000000000 RBX: 8b48fffffc32840f RCX: 0000000000000001
[ 8859.296458] RDX: ffffffffa04388e8 RSI: 0000000000000230 RDI: 8b48fffffc32840f
[ 8859.297253] RBP: ffff88081ace1628 R08: ffff88074c3cec98 R09: ffffc900b2692000
[ 8859.298054] R10: ffff88081af2d420 R11: ffff88074c375600 R12: ffff88079d23e350
[ 8859.298858] R13: 0000000000000230 R14: 0000000000000001 R15: 8b48fffffc330477
[ 8859.299662] FS:  00007f766e47d700(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000
[ 8859.300468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8859.301270] CR2: 000000000fd67000 CR3: 0000000802eea000 CR4: 00000000000407e0
[ 8859.302076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8859.302882] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 8859.303682] Process rsync (pid: 3128, veid: 101, threadinfo ffff88081ace0000, task ffff8807fe856ef0)
[ 8859.304491] Stack:
[ 8859.305289]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 8859.305315] <d> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 8859.306136] <d> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 8859.307742] Call Trace:
[ 8859.308567]  [<ffffffffa03fb833>] zio_buf_alloc+0x23/0x30 [zfs]
[ 8859.309389]  [<ffffffffa035dcb8>] arc_get_data_buf+0x498/0x4d0 [zfs]
[ 8859.310212]  [<ffffffffa035e540>] arc_buf_alloc+0xf0/0x130 [zfs]
[ 8859.311028]  [<ffffffffa035ea08>] arc_read+0x458/0x8e0 [zfs]
[ 8859.311836]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8859.312657]  [<ffffffffa037b2ca>] ? dnode_block_freed+0x14a/0x160 [zfs]
[ 8859.313478]  [<ffffffffa03622d0>] ? dbuf_read_done+0x0/0x110 [zfs]
[ 8859.314301]  [<ffffffffa0362b7b>] dbuf_read+0x1eb/0x740 [zfs]
[ 8859.315124]  [<ffffffffa036b29f>] dmu_spill_hold_by_dnode+0x4f/0x160 [zfs]
[ 8859.315948]  [<ffffffffa036b573>] dmu_spill_hold_existing+0x163/0x170 [zfs]
[ 8859.316773]  [<ffffffffa039cabf>] sa_get_spill+0x3f/0x80 [zfs]
[ 8859.317598]  [<ffffffffa039d97e>] sa_attr_op+0x8e/0x3e0 [zfs]
[ 8859.318417]  [<ffffffffa039e033>] sa_lookup_impl+0x13/0x20 [zfs]
[ 8859.319238]  [<ffffffffa039e0f2>] sa_lookup+0x42/0x60 [zfs]
[ 8859.320060]  [<ffffffffa03d3d21>] zfs_dirent_lock+0x4c1/0x540 [zfs]
[ 8859.320892]  [<ffffffffa028861c>] ? xdr_dec_string+0x8c/0xf0 [spl]
[ 8859.321713]  [<ffffffffa03d3e2a>] zfs_get_xattrdir+0x8a/0x180 [zfs]
[ 8859.322519]  [<ffffffffa0321d30>] ? nvs_operation+0x150/0x2e0 [znvpair]
[ 8859.323315]  [<ffffffffa0323081>] ? nvlist_common+0x111/0x1f0 [znvpair]
[ 8859.324096]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8859.324874]  [<ffffffffa03edd35>] zfs_lookup+0x215/0x340 [zfs]
[ 8859.325634]  [<ffffffffa04048eb>] __zpl_xattr_get+0x8b/0x200 [zfs]
[ 8859.326389]  [<ffffffffa0405ad3>] zpl_xattr_get+0x73/0x150 [zfs]
[ 8859.327111]  [<ffffffff811bb9b5>] ? path_to_nameidata+0x25/0x60
[ 8859.327825]  [<ffffffffa0405e46>] zpl_get_acl+0xd6/0x220 [zfs]
[ 8859.328523]  [<ffffffffa0405fe8>] zpl_xattr_acl_get+0x58/0x90 [zfs]
[ 8859.329202]  [<ffffffffa0406054>] zpl_xattr_acl_get_access+0x14/0x20 [zfs]
[ 8859.329873]  [<ffffffff811d4a17>] generic_getxattr+0x87/0x90
[ 8859.330527]  [<ffffffff811d668f>] vfs_getxattr+0x6f/0x90
[ 8859.331163]  [<ffffffff811d6730>] getxattr+0x80/0x170
[ 8859.331786]  [<ffffffff811be5fb>] ? path_walk+0x7b/0xe0
[ 8859.332398]  [<ffffffff811bad66>] ? final_putname+0x26/0x50
[ 8859.333004]  [<ffffffff811bb7e9>] ? putname+0x29/0x40
[ 8859.333604]  [<ffffffff811bf934>] ? user_path_at+0x64/0xa0
[ 8859.334194]  [<ffffffff811d6901>] sys_getxattr+0x61/0xa0
[ 8859.334784]  [<ffffffff810f4ef7>] ? audit_syscall_entry+0x1d7/0x200
[ 8859.335368]  [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
[ 8859.335945] Code: 00 00 00 0f 1f 44 00 00 f6 05 88 3e 01 00 01 48 89 fb 41 89 f5 74 0d f6 05 72 3e 01 00 08 0f 85 b3 01 00 00 4c 8d bb 68 80 00 00 <f0> ff 83 68 80 00 00 80 bb 48 80 00 00 00 0f 89 80 00 00 00 4c 
[ 8859.337352] RIP  [<ffffffffa027bfbc>] spl_kmem_cache_alloc+0x3c/0x1090 [spl]
[ 8859.337973]  RSP <ffff88081ace1518>
[ 8859.341319] Tainting kernel with flag 0x7
[ 8859.341922] Pid: 3128, comm: rsync veid: 101 Tainted: P           ---------------    2.6.32-042stab094.7 #1
[ 8859.342557] Call Trace:
[ 8859.343168]  [<ffffffff810754d1>] ? add_taint+0x71/0x80
[ 8859.343810]  [<ffffffff815334c4>] ? oops_end+0x54/0x100
[ 8859.344443]  [<ffffffff8101151b>] ? die+0x5b/0x90
[ 8859.345066]  [<ffffffff81533022>] ? do_general_protection+0x152/0x160
[ 8859.345692]  [<ffffffff815327f5>] ? general_protection+0x25/0x30
[ 8859.346318]  [<ffffffffa027bfbc>] ? spl_kmem_cache_alloc+0x3c/0x1090 [spl]
[ 8859.346963]  [<ffffffffa03fb833>] ? zio_buf_alloc+0x23/0x30 [zfs]
[ 8859.347601]  [<ffffffffa035dcb8>] ? arc_get_data_buf+0x498/0x4d0 [zfs]
[ 8859.348246]  [<ffffffffa035e540>] ? arc_buf_alloc+0xf0/0x130 [zfs]
[ 8859.348887]  [<ffffffffa035ea08>] ? arc_read+0x458/0x8e0 [zfs]
[ 8859.349509]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8859.350146]  [<ffffffffa037b2ca>] ? dnode_block_freed+0x14a/0x160 [zfs]
[ 8859.350784]  [<ffffffffa03622d0>] ? dbuf_read_done+0x0/0x110 [zfs]
[ 8859.351422]  [<ffffffffa0362b7b>] ? dbuf_read+0x1eb/0x740 [zfs]
[ 8859.352058]  [<ffffffffa036b29f>] ? dmu_spill_hold_by_dnode+0x4f/0x160 [zfs]
[ 8859.352699]  [<ffffffffa036b573>] ? dmu_spill_hold_existing+0x163/0x170 [zfs]
[ 8859.353341]  [<ffffffffa039cabf>] ? sa_get_spill+0x3f/0x80 [zfs]
[ 8859.353989]  [<ffffffffa039d97e>] ? sa_attr_op+0x8e/0x3e0 [zfs]
[ 8859.354630]  [<ffffffffa039e033>] ? sa_lookup_impl+0x13/0x20 [zfs]
[ 8859.355274]  [<ffffffffa039e0f2>] ? sa_lookup+0x42/0x60 [zfs]
[ 8859.355916]  [<ffffffffa03d3d21>] ? zfs_dirent_lock+0x4c1/0x540 [zfs]
[ 8859.356541]  [<ffffffffa028861c>] ? xdr_dec_string+0x8c/0xf0 [spl]
[ 8859.357183]  [<ffffffffa03d3e2a>] ? zfs_get_xattrdir+0x8a/0x180 [zfs]
[ 8859.357812]  [<ffffffffa0321d30>] ? nvs_operation+0x150/0x2e0 [znvpair]
[ 8859.358440]  [<ffffffffa0323081>] ? nvlist_common+0x111/0x1f0 [znvpair]
[ 8859.359064]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8859.359704]  [<ffffffffa03edd35>] ? zfs_lookup+0x215/0x340 [zfs]
[ 8859.360340]  [<ffffffffa04048eb>] ? __zpl_xattr_get+0x8b/0x200 [zfs]
[ 8859.360982]  [<ffffffffa0405ad3>] ? zpl_xattr_get+0x73/0x150 [zfs]
[ 8859.361601]  [<ffffffff811bb9b5>] ? path_to_nameidata+0x25/0x60
[ 8859.362241]  [<ffffffffa0405e46>] ? zpl_get_acl+0xd6/0x220 [zfs]
[ 8859.362879]  [<ffffffffa0405fe8>] ? zpl_xattr_acl_get+0x58/0x90 [zfs]
[ 8859.363516]  [<ffffffffa0406054>] ? zpl_xattr_acl_get_access+0x14/0x20 [zfs]
[ 8859.364139]  [<ffffffff811d4a17>] ? generic_getxattr+0x87/0x90
[ 8859.364766]  [<ffffffff811d668f>] ? vfs_getxattr+0x6f/0x90
[ 8859.365388]  [<ffffffff811d6730>] ? getxattr+0x80/0x170
[ 8859.366011]  [<ffffffff811be5fb>] ? path_walk+0x7b/0xe0
[ 8859.366636]  [<ffffffff811bad66>] ? final_putname+0x26/0x50
[ 8859.367258]  [<ffffffff811bb7e9>] ? putname+0x29/0x40
[ 8859.367881]  [<ffffffff811bf934>] ? user_path_at+0x64/0xa0
[ 8859.368500]  [<ffffffff811d6901>] ? sys_getxattr+0x61/0xa0
[ 8859.369095]  [<ffffffff810f4ef7>] ? audit_syscall_entry+0x1d7/0x200
[ 8859.369664]  [<ffffffff8100b102>] ? system_call_fastpath+0x16/0x1b
[ 8859.370220] ---[ end trace 59552c205120134e ]---
[ 8859.370769] Kernel panic - not syncing: Fatal exception
[ 8859.371320] Pid: 3128, comm: rsync veid: 101 Tainted: P      D    ---------------    2.6.32-042stab094.7 #1
[ 8859.371882] Call Trace:
[ 8859.372425]  [<ffffffff8152eafb>] ? panic+0xa7/0x16f
[ 8859.372975]  [<ffffffff81533554>] ? oops_end+0xe4/0x100
[ 8859.373525]  [<ffffffff8101151b>] ? die+0x5b/0x90
[ 8859.374073]  [<ffffffff81533022>] ? do_general_protection+0x152/0x160
[ 8859.374625]  [<ffffffff815327f5>] ? general_protection+0x25/0x30
[ 8859.375181]  [<ffffffffa027bfbc>] ? spl_kmem_cache_alloc+0x3c/0x1090 [spl]
[ 8859.375753]  [<ffffffffa03fb833>] ? zio_buf_alloc+0x23/0x30 [zfs]
[ 8859.376300]  [<ffffffffa035dcb8>] ? arc_get_data_buf+0x498/0x4d0 [zfs]
[ 8859.376862]  [<ffffffffa035e540>] ? arc_buf_alloc+0xf0/0x130 [zfs]
[ 8859.377410]  [<ffffffffa035ea08>] ? arc_read+0x458/0x8e0 [zfs]
[ 8859.377961]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8859.378517]  [<ffffffffa037b2ca>] ? dnode_block_freed+0x14a/0x160 [zfs]
[ 8859.379072]  [<ffffffffa03622d0>] ? dbuf_read_done+0x0/0x110 [zfs]
[ 8859.379635]  [<ffffffffa0362b7b>] ? dbuf_read+0x1eb/0x740 [zfs]
[ 8859.380182]  [<ffffffffa036b29f>] ? dmu_spill_hold_by_dnode+0x4f/0x160 [zfs]
[ 8859.380746]  [<ffffffffa036b573>] ? dmu_spill_hold_existing+0x163/0x170 [zfs]
[ 8859.381301]  [<ffffffffa039cabf>] ? sa_get_spill+0x3f/0x80 [zfs]
[ 8859.381867]  [<ffffffffa039d97e>] ? sa_attr_op+0x8e/0x3e0 [zfs]
[ 8859.382417]  [<ffffffffa039e033>] ? sa_lookup_impl+0x13/0x20 [zfs]
[ 8859.382982]  [<ffffffffa039e0f2>] ? sa_lookup+0x42/0x60 [zfs]
[ 8859.383549]  [<ffffffffa03d3d21>] ? zfs_dirent_lock+0x4c1/0x540 [zfs]
[ 8859.384089]  [<ffffffffa028861c>] ? xdr_dec_string+0x8c/0xf0 [spl]
[ 8859.384654]  [<ffffffffa03d3e2a>] ? zfs_get_xattrdir+0x8a/0x180 [zfs]
[ 8859.385190]  [<ffffffffa0321d30>] ? nvs_operation+0x150/0x2e0 [znvpair]
[ 8859.385743]  [<ffffffffa0323081>] ? nvlist_common+0x111/0x1f0 [znvpair]
[ 8859.386278]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8859.386843]  [<ffffffffa03edd35>] ? zfs_lookup+0x215/0x340 [zfs]
[ 8859.387394]  [<ffffffffa04048eb>] ? __zpl_xattr_get+0x8b/0x200 [zfs]
[ 8859.387962]  [<ffffffffa0405ad3>] ? zpl_xattr_get+0x73/0x150 [zfs]
[ 8859.388506]  [<ffffffff811bb9b5>] ? path_to_nameidata+0x25/0x60
[ 8859.389066]  [<ffffffffa0405e46>] ? zpl_get_acl+0xd6/0x220 [zfs]
[ 8859.389631]  [<ffffffffa0405fe8>] ? zpl_xattr_acl_get+0x58/0x90 [zfs]
[ 8859.390181]  [<ffffffffa0406054>] ? zpl_xattr_acl_get_access+0x14/0x20 [zfs]
[ 8859.390733]  [<ffffffff811d4a17>] ? generic_getxattr+0x87/0x90
[ 8859.391266]  [<ffffffff811d668f>] ? vfs_getxattr+0x6f/0x90
[ 8859.391811]  [<ffffffff811d6730>] ? getxattr+0x80/0x170
[ 8859.392343]  [<ffffffff811be5fb>] ? path_walk+0x7b/0xe0
[ 8859.392888]  [<ffffffff811bad66>] ? final_putname+0x26/0x50
[ 8859.393417]  [<ffffffff811bb7e9>] ? putname+0x29/0x40
[ 8859.393960]  [<ffffffff811bf934>] ? user_path_at+0x64/0xa0
[ 8859.394500]  [<ffffffff811d6901>] ? sys_getxattr+0x61/0xa0
[ 8859.395037]  [<ffffffff810f4ef7>] ? audit_syscall_entry+0x1d7/0x200
[ 8859.395586]  [<ffffffff8100b102>] ? system_call_fastpath+0x16/0x1b
[ 8859.396139] drm_kms_helper: panic occurred, switching back to text console
[ 8859.396710] Rebooting in 30 seconds..
[ 8889.388481] ACPI MEMORY or I/O RESET_REG.

There is backtrace for kernel panic with zfs modules from testing repo:

[ 8544.549916] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 8544.550523] IP: [<ffffffffa0329e89>] arc_buf_remove_ref+0x19/0x110 [zfs]
[ 8544.551139] PGD 8066a1067 PUD 805175067 PMD 0 
[ 8544.551738] Oops: 0000 [#1] SMP 
[ 8544.552332] last sysfs file: /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq
[ 8544.552944] CPU 1 
[ 8544.552951] Modules linked in: netconsole configfs vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache auth_rpcgss nfs_acl sunrpc vziolimit vzdquota ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS xt_limit xt_dscp ipt_REJECT vzevent acpi_cpufreq freq_table mperf vznetdev vzmon vzdev ipv6 xt_recent xt_multiport xt_owner iptable_filter iptable_nat nf_nat ipt_LOG xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_mangle ip_tables iTCO_wdt iTCO_vendor_support serio_raw i2c_i801 e1000e ptp pps_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate lpc_ich mfd_core shpchp sg ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci xhci_hcd megaraid_sas i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 8544.559343] 
[ 8544.560077] Pid: 2931, comm: rsync veid: 101 Tainted: P           ---------------    2.6.32-042stab094.7 #1 042stab094_7 System manufacturer System Product Name/P8H77-V
[ 8544.561627] RIP: 0010:[<ffffffffa0329e89>]  [<ffffffffa0329e89>] arc_buf_remove_ref+0x19/0x110 [zfs]
[ 8544.562429] RSP: 0018:ffff880809e45628  EFLAGS: 00010246
[ 8544.563214] RAX: ffff880809e44000 RBX: ffff88013ddd45e0 RCX: ffff880368475228
[ 8544.564003] RDX: 0000000000000034 RSI: ffff88013ddd45e0 RDI: 0000000000000000
[ 8544.564786] RBP: ffff880809e45658 R08: ffff88026ca4b2d0 R09: 0000000000000000
[ 8544.565562] R10: ffff88026cfa4a30 R11: 0000000000000000 R12: 0000000000000000
[ 8544.566334] R13: ffff88026ca4b1d0 R14: 0000000000000000 R15: 0000000000000000
[ 8544.567103] FS:  00007f95ca4ae700(0000) GS:ffff88002c280000(0000) knlGS:0000000000000000
[ 8544.567874] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8544.568644] CR2: 0000000000000038 CR3: 0000000805d4d000 CR4: 00000000000407e0
[ 8544.569423] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8544.570208] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 8544.570990] Process rsync (pid: 2931, veid: 101, threadinfo ffff880809e44000, task ffff880816bfcb30)
[ 8544.571786] Stack:
[ 8544.572573]  ffff880809e45688 ffff88013ddd45e0 0000000000000000 ffff88026ca4b1d0
[ 8544.572599] <d> 0000008000000000 0000000000000000 ffff880809e45688 ffffffffa03336db
[ 8544.573420] <d> 0000000000000000 ffff88026ca4b1d0 ffff880118192b80 0000000002000000
[ 8544.575048] Call Trace:
[ 8544.575875]  [<ffffffffa03336db>] dbuf_read_done+0x8b/0x110 [zfs]
[ 8544.576719]  [<ffffffffa032f6e7>] arc_read+0x357/0x960 [zfs]
[ 8544.577565]  [<ffffffffa0333650>] ? dbuf_read_done+0x0/0x110 [zfs]
[ 8544.578415]  [<ffffffffa033434a>] dbuf_read+0x1fa/0x7b0 [zfs]
[ 8544.579259]  [<ffffffffa033cc2f>] dmu_spill_hold_by_dnode+0x4f/0x160 [zfs]
[ 8544.580109]  [<ffffffffa033cf03>] dmu_spill_hold_existing+0x163/0x170 [zfs]
[ 8544.580964]  [<ffffffffa034c8bb>] ? dmu_zfetch+0x32b/0xe30 [zfs]
[ 8544.581823]  [<ffffffffa03713bf>] sa_get_spill+0x3f/0x80 [zfs]
[ 8544.582678]  [<ffffffffa037226e>] sa_attr_op+0x8e/0x3e0 [zfs]
[ 8544.583529]  [<ffffffffa0372923>] sa_lookup_impl+0x13/0x20 [zfs]
[ 8544.584384]  [<ffffffffa03729e2>] sa_lookup+0x42/0x60 [zfs]
[ 8544.585224]  [<ffffffffa027d937>] ? __cv_init+0x37/0x60 [spl]
[ 8544.586079]  [<ffffffffa03a8a11>] zfs_dirent_lock+0x4c1/0x540 [zfs]
[ 8544.586914]  [<ffffffffa027df0c>] ? xdr_dec_string+0x8c/0xf0 [spl]
[ 8544.587758]  [<ffffffffa03a8b1a>] zfs_get_xattrdir+0x8a/0x180 [zfs]
[ 8544.588581]  [<ffffffffa02f2d30>] ? nvs_operation+0x150/0x2e0 [znvpair]
[ 8544.589406]  [<ffffffffa02f4081>] ? nvlist_common+0x111/0x1f0 [znvpair]
[ 8544.590219]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8544.591041]  [<ffffffffa03c2b35>] zfs_lookup+0x215/0x340 [zfs]
[ 8544.591846]  [<ffffffffa03da4f3>] __zpl_xattr_get+0x93/0x200 [zfs]
[ 8544.592637]  [<ffffffffa03db6d3>] zpl_xattr_get+0x73/0x150 [zfs]
[ 8544.593394]  [<ffffffff811bb9b5>] ? path_to_nameidata+0x25/0x60
[ 8544.594153]  [<ffffffffa03dba46>] zpl_get_acl+0xd6/0x220 [zfs]
[ 8544.594910]  [<ffffffffa03dbbe8>] zpl_xattr_acl_get+0x58/0x90 [zfs]
[ 8544.595635]  [<ffffffffa03dbc54>] zpl_xattr_acl_get_access+0x14/0x20 [zfs]
[ 8544.596351]  [<ffffffff811d4a17>] generic_getxattr+0x87/0x90
[ 8544.597028]  [<ffffffff811d668f>] vfs_getxattr+0x6f/0x90
[ 8544.597709]  [<ffffffff811d6730>] getxattr+0x80/0x170
[ 8544.598374]  [<ffffffff811be5fb>] ? path_walk+0x7b/0xe0
[ 8544.599019]  [<ffffffff811bad66>] ? final_putname+0x26/0x50
[ 8544.599651]  [<ffffffff811bb7e9>] ? putname+0x29/0x40
[ 8544.600269]  [<ffffffff811bf934>] ? user_path_at+0x64/0xa0
[ 8544.600880]  [<ffffffff811d6901>] sys_getxattr+0x61/0xa0
[ 8544.601487]  [<ffffffff810f4ef7>] ? audit_syscall_entry+0x1d7/0x200
[ 8544.602103]  [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
[ 8544.602715] Code: 76 ff ff ff 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 45 31 f6 <48> 83 7f 38 00 48 8b 07 48 8b 15 48 60 0e 00 48 89 fb 49 89 f7 
[ 8544.604179] RIP  [<ffffffffa0329e89>] arc_buf_remove_ref+0x19/0x110 [zfs]
[ 8544.604862]  RSP <ffff880809e45628>
[ 8544.605520] CR2: 0000000000000038
[ 8544.608850] Tainting kernel with flag 0x7
[ 8544.609490] Pid: 2931, comm: rsync veid: 101 Tainted: P           ---------------    2.6.32-042stab094.7 #1
[ 8544.610153] Call Trace:
[ 8544.610798]  [<ffffffff810754d1>] ? add_taint+0x71/0x80
[ 8544.611442]  [<ffffffff815334c4>] ? oops_end+0x54/0x100
[ 8544.612076]  [<ffffffff81049eeb>] ? no_context+0xfb/0x260
[ 8544.612705]  [<ffffffff8104a165>] ? __bad_area_nosemaphore+0x115/0x1e0
[ 8544.613331]  [<ffffffff8104a29e>] ? bad_area+0x4e/0x60
[ 8544.613950]  [<ffffffff8104aa3f>] ? __do_page_fault+0x3cf/0x480
[ 8544.614574]  [<ffffffffa0276dcb>] ? spl_kmem_cache_alloc+0x6b/0x820 [spl]
[ 8544.615213]  [<ffffffffa03d27d7>] ? zio_cons+0x47/0xb0 [zfs]
[ 8544.615839]  [<ffffffffa0276eec>] ? spl_kmem_cache_alloc+0x18c/0x820 [spl]
[ 8544.616463]  [<ffffffff8153545e>] ? do_page_fault+0x3e/0xa0
[ 8544.617089]  [<ffffffff81532825>] ? page_fault+0x25/0x30
[ 8544.617739]  [<ffffffffa0329e89>] ? arc_buf_remove_ref+0x19/0x110 [zfs]
[ 8544.618376]  [<ffffffffa03336db>] ? dbuf_read_done+0x8b/0x110 [zfs]
[ 8544.619012]  [<ffffffffa032f6e7>] ? arc_read+0x357/0x960 [zfs]
[ 8544.619652]  [<ffffffffa0333650>] ? dbuf_read_done+0x0/0x110 [zfs]
[ 8544.620294]  [<ffffffffa033434a>] ? dbuf_read+0x1fa/0x7b0 [zfs]
[ 8544.620930]  [<ffffffffa033cc2f>] ? dmu_spill_hold_by_dnode+0x4f/0x160 [zfs]
[ 8544.621570]  [<ffffffffa033cf03>] ? dmu_spill_hold_existing+0x163/0x170 [zfs]
[ 8544.622210]  [<ffffffffa034c8bb>] ? dmu_zfetch+0x32b/0xe30 [zfs]
[ 8544.622851]  [<ffffffffa03713bf>] ? sa_get_spill+0x3f/0x80 [zfs]
[ 8544.623488]  [<ffffffffa037226e>] ? sa_attr_op+0x8e/0x3e0 [zfs]
[ 8544.624126]  [<ffffffffa0372923>] ? sa_lookup_impl+0x13/0x20 [zfs]
[ 8544.624763]  [<ffffffffa03729e2>] ? sa_lookup+0x42/0x60 [zfs]
[ 8544.625385]  [<ffffffffa027d937>] ? __cv_init+0x37/0x60 [spl]
[ 8544.626021]  [<ffffffffa03a8a11>] ? zfs_dirent_lock+0x4c1/0x540 [zfs]
[ 8544.626634]  [<ffffffffa027df0c>] ? xdr_dec_string+0x8c/0xf0 [spl]
[ 8544.627265]  [<ffffffffa03a8b1a>] ? zfs_get_xattrdir+0x8a/0x180 [zfs]
[ 8544.627885]  [<ffffffffa02f2d30>] ? nvs_operation+0x150/0x2e0 [znvpair]
[ 8544.628505]  [<ffffffffa02f4081>] ? nvlist_common+0x111/0x1f0 [znvpair]
[ 8544.629124]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8544.629752]  [<ffffffffa03c2b35>] ? zfs_lookup+0x215/0x340 [zfs]
[ 8544.630383]  [<ffffffffa03da4f3>] ? __zpl_xattr_get+0x93/0x200 [zfs]
[ 8544.631016]  [<ffffffffa03db6d3>] ? zpl_xattr_get+0x73/0x150 [zfs]
[ 8544.631635]  [<ffffffff811bb9b5>] ? path_to_nameidata+0x25/0x60
[ 8544.632274]  [<ffffffffa03dba46>] ? zpl_get_acl+0xd6/0x220 [zfs]
[ 8544.632914]  [<ffffffffa03dbbe8>] ? zpl_xattr_acl_get+0x58/0x90 [zfs]
[ 8544.633556]  [<ffffffffa03dbc54>] ? zpl_xattr_acl_get_access+0x14/0x20 [zfs]
[ 8544.634182]  [<ffffffff811d4a17>] ? generic_getxattr+0x87/0x90
[ 8544.634807]  [<ffffffff811d668f>] ? vfs_getxattr+0x6f/0x90
[ 8544.635427]  [<ffffffff811d6730>] ? getxattr+0x80/0x170
[ 8544.636020]  [<ffffffff811be5fb>] ? path_walk+0x7b/0xe0
[ 8544.636584]  [<ffffffff811bad66>] ? final_putname+0x26/0x50
[ 8544.637136]  [<ffffffff811bb7e9>] ? putname+0x29/0x40
[ 8544.637682]  [<ffffffff811bf934>] ? user_path_at+0x64/0xa0
[ 8544.638229]  [<ffffffff811d6901>] ? sys_getxattr+0x61/0xa0
[ 8544.638776]  [<ffffffff810f4ef7>] ? audit_syscall_entry+0x1d7/0x200
[ 8544.639313]  [<ffffffff8100b102>] ? system_call_fastpath+0x16/0x1b
[ 8544.639865] ---[ end trace a8a55ca80638d6fe ]---
[ 8544.640411] Kernel panic - not syncing: Fatal exception
[ 8544.640941] Pid: 2931, comm: rsync veid: 101 Tainted: P      D    ---------------    2.6.32-042stab094.7 #1
[ 8544.641490] Call Trace:
[ 8544.642051]  [<ffffffff8152eafb>] ? panic+0xa7/0x16f
[ 8544.642603]  [<ffffffff81533554>] ? oops_end+0xe4/0x100
[ 8544.643153]  [<ffffffff81049eeb>] ? no_context+0xfb/0x260
[ 8544.643699]  [<ffffffff8104a165>] ? __bad_area_nosemaphore+0x115/0x1e0
[ 8544.644246]  [<ffffffff8104a29e>] ? bad_area+0x4e/0x60
[ 8544.644787]  [<ffffffff8104aa3f>] ? __do_page_fault+0x3cf/0x480
[ 8544.645329]  [<ffffffffa0276dcb>] ? spl_kmem_cache_alloc+0x6b/0x820 [spl]
[ 8544.645894]  [<ffffffffa03d27d7>] ? zio_cons+0x47/0xb0 [zfs]
[ 8544.646440]  [<ffffffffa0276eec>] ? spl_kmem_cache_alloc+0x18c/0x820 [spl]
[ 8544.646988]  [<ffffffff8153545e>] ? do_page_fault+0x3e/0xa0
[ 8544.647533]  [<ffffffff81532825>] ? page_fault+0x25/0x30
[ 8544.648089]  [<ffffffffa0329e89>] ? arc_buf_remove_ref+0x19/0x110 [zfs]
[ 8544.648654]  [<ffffffffa03336db>] ? dbuf_read_done+0x8b/0x110 [zfs]
[ 8544.649216]  [<ffffffffa032f6e7>] ? arc_read+0x357/0x960 [zfs]
[ 8544.649774]  [<ffffffffa0333650>] ? dbuf_read_done+0x0/0x110 [zfs]
[ 8544.650343]  [<ffffffffa033434a>] ? dbuf_read+0x1fa/0x7b0 [zfs]
[ 8544.650909]  [<ffffffffa033cc2f>] ? dmu_spill_hold_by_dnode+0x4f/0x160 [zfs]
[ 8544.651478]  [<ffffffffa033cf03>] ? dmu_spill_hold_existing+0x163/0x170 [zfs]
[ 8544.652043]  [<ffffffffa034c8bb>] ? dmu_zfetch+0x32b/0xe30 [zfs]
[ 8544.652618]  [<ffffffffa03713bf>] ? sa_get_spill+0x3f/0x80 [zfs]
[ 8544.653194]  [<ffffffffa037226e>] ? sa_attr_op+0x8e/0x3e0 [zfs]
[ 8544.653766]  [<ffffffffa0372923>] ? sa_lookup_impl+0x13/0x20 [zfs]
[ 8544.654337]  [<ffffffffa03729e2>] ? sa_lookup+0x42/0x60 [zfs]
[ 8544.654890]  [<ffffffffa027d937>] ? __cv_init+0x37/0x60 [spl]
[ 8544.655454]  [<ffffffffa03a8a11>] ? zfs_dirent_lock+0x4c1/0x540 [zfs]
[ 8544.656011]  [<ffffffffa027df0c>] ? xdr_dec_string+0x8c/0xf0 [spl]
[ 8544.656580]  [<ffffffffa03a8b1a>] ? zfs_get_xattrdir+0x8a/0x180 [zfs]
[ 8544.657138]  [<ffffffffa02f2d30>] ? nvs_operation+0x150/0x2e0 [znvpair]
[ 8544.657696]  [<ffffffffa02f4081>] ? nvlist_common+0x111/0x1f0 [znvpair]
[ 8544.658250]  [<ffffffff81530f1e>] ? mutex_lock+0x1e/0x50
[ 8544.658819]  [<ffffffffa03c2b35>] ? zfs_lookup+0x215/0x340 [zfs]
[ 8544.659387]  [<ffffffffa03da4f3>] ? __zpl_xattr_get+0x93/0x200 [zfs]
[ 8544.659960]  [<ffffffffa03db6d3>] ? zpl_xattr_get+0x73/0x150 [zfs]
[ 8544.660517]  [<ffffffff811bb9b5>] ? path_to_nameidata+0x25/0x60
[ 8544.661085]  [<ffffffffa03dba46>] ? zpl_get_acl+0xd6/0x220 [zfs]
[ 8544.661657]  [<ffffffffa03dbbe8>] ? zpl_xattr_acl_get+0x58/0x90 [zfs]
[ 8544.662230]  [<ffffffffa03dbc54>] ? zpl_xattr_acl_get_access+0x14/0x20 [zfs]
[ 8544.662788]  [<ffffffff811d4a17>] ? generic_getxattr+0x87/0x90
[ 8544.663346]  [<ffffffff811d668f>] ? vfs_getxattr+0x6f/0x90
[ 8544.663905]  [<ffffffff811d6730>] ? getxattr+0x80/0x170
[ 8544.664465]  [<ffffffff811be5fb>] ? path_walk+0x7b/0xe0
[ 8544.665025]  [<ffffffff811bad66>] ? final_putname+0x26/0x50
[ 8544.665581]  [<ffffffff811bb7e9>] ? putname+0x29/0x40
[ 8544.666135]  [<ffffffff811bf934>] ? user_path_at+0x64/0xa0
[ 8544.666687]  [<ffffffff811d6901>] ? sys_getxattr+0x61/0xa0
[ 8544.667233]  [<ffffffff810f4ef7>] ? audit_syscall_entry+0x1d7/0x200
[ 8544.667776]  [<ffffffff8100b102>] ? system_call_fastpath+0x16/0x1b
[ 8544.668327] drm_kms_helper: panic occurred, switching back to text console
[ 8544.668883] ------------[ cut here ]------------
[ 8544.669438] WARNING: at arch/x86/kernel/smp.c:118 native_smp_send_reschedule+0x5c/0x60() (Tainted: P      D    ---------------   )
[ 8544.670606] Hardware name: System Product Name
[ 8544.671194] Modules linked in: netconsole configfs vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache auth_rpcgss nfs_acl sunrpc vziolimit vzdquota ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS xt_limit xt_dscp ipt_REJECT vzevent acpi_cpufreq freq_table mperf vznetdev vzmon vzdev ipv6 xt_recent xt_multiport xt_owner iptable_filter iptable_nat nf_nat ipt_LOG xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_mangle ip_tables iTCO_wdt iTCO_vendor_support serio_raw i2c_i801 e1000e ptp pps_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate lpc_ich mfd_core shpchp sg ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci xhci_hcd megaraid_sas i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 8544.676893] Pid: 20, comm: events/1 veid: 0 Tainted: P      D    ---------------    2.6.32-042stab094.7 #1
[ 8544.677645] Call Trace:
[ 8544.678393]  <IRQ>  [<ffffffff81075647>] ? warn_slowpath_common+0x87/0xc0
[ 8544.679165]  [<ffffffff8107569a>] ? warn_slowpath_null+0x1a/0x20
[ 8544.679934]  [<ffffffff8103145c>] ? native_smp_send_reschedule+0x5c/0x60
[ 8544.680708]  [<ffffffff810603c0>] ? scheduler_tick+0x210/0x270
[ 8544.681479]  [<ffffffff810b3860>] ? tick_sched_timer+0x0/0xc0
[ 8544.682250]  [<ffffffff8108896e>] ? update_process_times+0x6e/0x90
[ 8544.683025]  [<ffffffff810b38c6>] ? tick_sched_timer+0x66/0xc0
[ 8544.683801]  [<ffffffff810a62fe>] ? __run_hrtimer+0x8e/0x1d0
[ 8544.684578]  [<ffffffff8107ea7d>] ? __do_softirq+0x18d/0x250
[ 8544.685353]  [<ffffffff810a66ba>] ? hrtimer_interrupt+0xea/0x260
[ 8544.686130]  [<ffffffff8103280d>] ? local_apic_timer_interrupt+0x3d/0x70
[ 8544.686910]  [<ffffffff815390a5>] ? smp_apic_timer_interrupt+0x45/0x60
[ 8544.687689]  [<ffffffff8100bc53>] ? apic_timer_interrupt+0x13/0x20
[ 8544.688463]  <EOI>  [<ffffffff8106947f>] ? finish_task_switch+0x4f/0xf0
[ 8544.689255]  [<ffffffff8152f2b0>] ? thread_return+0x4e/0x87e
[ 8544.690043]  [<ffffffff810a1e8e>] ? prepare_to_wait+0x4e/0x80
[ 8544.690832]  [<ffffffff81347650>] ? flush_to_ldisc+0x0/0x1b0
[ 8544.691615]  [<ffffffff8109a4e5>] ? worker_thread+0x235/0x2d0
[ 8544.692396]  [<ffffffff810a1b60>] ? autoremove_wake_function+0x0/0x40
[ 8544.693182]  [<ffffffff8109a2b0>] ? worker_thread+0x0/0x2d0
[ 8544.693965]  [<ffffffff810a1546>] ? kthread+0x96/0xa0
[ 8544.694746]  [<ffffffff8100c34a>] ? child_rip+0xa/0x20
[ 8544.695527]  [<ffffffff810a14b0>] ? kthread+0x0/0xa0
[ 8544.696302]  [<ffffffff8100c340>] ? child_rip+0x0/0x20
[ 8544.697076] ---[ end trace a8a55ca80638d6ff ]---
[ 8544.697851] Tainting kernel with flag 0x9
[ 8544.698632] Pid: 20, comm: events/1 veid: 0 Tainted: P      D    ---------------    2.6.32-042stab094.7 #1
[ 8544.699442] Call Trace:
[ 8544.700246]  <IRQ>  [<ffffffff810754d1>] ? add_taint+0x71/0x80
[ 8544.701072]  [<ffffffff81075654>] ? warn_slowpath_common+0x94/0xc0
[ 8544.701903]  [<ffffffff8107569a>] ? warn_slowpath_null+0x1a/0x20
[ 8544.702725]  [<ffffffff8103145c>] ? native_smp_send_reschedule+0x5c/0x60
[ 8544.703553]  [<ffffffff810603c0>] ? scheduler_tick+0x210/0x270
[ 8544.704375]  [<ffffffff810b3860>] ? tick_sched_timer+0x0/0xc0
[ 8544.705190]  [<ffffffff8108896e>] ? update_process_times+0x6e/0x90
[ 8544.705990]  [<ffffffff810b38c6>] ? tick_sched_timer+0x66/0xc0
[ 8544.706770]  [<ffffffff810a62fe>] ? __run_hrtimer+0x8e/0x1d0
[ 8544.707548]  [<ffffffff8107ea7d>] ? __do_softirq+0x18d/0x250
[ 8544.708308]  [<ffffffff810a66ba>] ? hrtimer_interrupt+0xea/0x260
[ 8544.709049]  [<ffffffff8103280d>] ? local_apic_timer_interrupt+0x3d/0x70
[ 8544.709776]  [<ffffffff815390a5>] ? smp_apic_timer_interrupt+0x45/0x60
[ 8544.710485]  [<ffffffff8100bc53>] ? apic_timer_interrupt+0x13/0x20
[ 8544.711176]  <EOI>  [<ffffffff8106947f>] ? finish_task_switch+0x4f/0xf0
[ 8544.711859]  [<ffffffff8152f2b0>] ? thread_return+0x4e/0x87e
[ 8544.712522]  [<ffffffff810a1e8e>] ? prepare_to_wait+0x4e/0x80
[ 8544.713165]  [<ffffffff81347650>] ? flush_to_ldisc+0x0/0x1b0
[ 8544.713794]  [<ffffffff8109a4e5>] ? worker_thread+0x235/0x2d0
[ 8544.714407]  [<ffffffff810a1b60>] ? autoremove_wake_function+0x0/0x40
[ 8544.715017]  [<ffffffff8109a2b0>] ? worker_thread+0x0/0x2d0
[ 8544.715619]  [<ffffffff810a1546>] ? kthread+0x96/0xa0
[ 8544.716219]  [<ffffffff8100c34a>] ? child_rip+0xa/0x20
[ 8544.716818]  [<ffffffff810a14b0>] ? kthread+0x0/0xa0
[ 8544.717413]  [<ffffffff8100c340>] ? child_rip+0x0/0x20
[ 8544.718018] Rebooting in 30 seconds..
[ 8574.708864] ACPI MEMORY or I/O RESET_REG.

As you may note kernel panic occurs after 2 hours from server start. After server start I run rsync manually.

There are details about my config:

My zfs partition use dedup and gzip-7 compression. 2 SATA disks used for mdraids raid1 for swap/system/boot partitions with ext4. 8 SATA disks attached to LSI MegaRAID. For each of 8 drives raid0 volume created:

for i in {0..7}; do /opt/MegaRAID/MegaCli/MegaCli64 -cfgldadd -r0 [252:${i}] WB NORA Direct -strpsz128 -a0 -NoLog; done

zpool created on this 8 virtual raid0 drives.

# zpool history
History for 'data':
2014-11-28.16:49:25 zpool create data raidz2 scsi-3600605b0058c11a01c0b3969461e5b7b scsi-3600605b0058c11a01c0b396a4624fe49 scsi-3600605b0058c11a01c0b396a462c33d8 scsi-3600605b0058c11a01c0b396b46340d39 scsi-3600605b0058c11a01c0b396b463cb40f scsi-3600605b0058c11a01c0b396c46460a8c scsi-3600605b0058c11a01c0b396d464fcbce scsi-3600605b0058c11a01c0b396f466e9d33 -m /mnt/data -f
2014-11-28.17:00:09 zfs create data/101_srv_backup
2014-11-28.17:00:15 zfs set checksum=sha256 data/101_srv_backup
2014-11-28.17:00:24 zfs set compression=gzip-7 data/101_srv_backup
2014-11-28.17:00:30 zfs set dedup=sha256,verify data/101_srv_backup
2014-11-28.17:00:36 zfs set acltype=posixacl data/101_srv_backup
2014-12-02.23:20:41 zpool scrub data
2014-12-03.00:01:39 zpool scrub -s data
2014-12-03.00:02:16 zpool scrub data
2014-12-03.00:03:10 zpool scrub -s data

zdb output:

# zdb
data:
    version: 5000
    name: 'data'
    state: 0
    txg: 4
    pool_guid: 12475046494474843601
    errata: 0
    hostname: 'backup3'
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 12475046494474843601
        create_txg: 4
        children[0]:
            type: 'raidz'
            id: 0
            guid: 14601153350226527496
            nparity: 2
            metaslab_array: 33
            metaslab_shift: 37
            ashift: 9
            asize: 15998631542784
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 8542968575874962026
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b3969461e5b7b-part1'
                whole_disk: 1
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 13860580870913208389
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b396a4624fe49-part1'
                whole_disk: 1
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 1908794781113804524
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b396a462c33d8-part1'
                whole_disk: 1
                create_txg: 4
            children[3]:
                type: 'disk'
                id: 3
                guid: 17066831280715254701
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b396b46340d39-part1'
                whole_disk: 1
                create_txg: 4
            children[4]:
                type: 'disk'
                id: 4
                guid: 7113222535270645309
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b396b463cb40f-part1'
                whole_disk: 1
                create_txg: 4
            children[5]:
                type: 'disk'
                id: 5
                guid: 18416571366274627015
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b396c46460a8c-part1'
                whole_disk: 1
                create_txg: 4
            children[6]:
                type: 'disk'
                id: 6
                guid: 11293119840344833034
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b396d464fcbce-part1'
                whole_disk: 1
                create_txg: 4
            children[7]:
                type: 'disk'
                id: 7
                guid: 18378167447793469899
                path: '/dev/disk/by-id/scsi-3600605b0058c11a01c0b396f466e9d33-part1'
                whole_disk: 1
                create_txg: 4
    features_for_read:

zpool status:

# zpool status -v
  pool: data
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: scrub canceled on Wed Dec  3 00:03:05 2014
config:

    NAME                                        STATE     READ WRITE CKSUM
    data                                        ONLINE       0     0     0
      raidz2-0                                  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b3969461e5b7b  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b396a4624fe49  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b396a462c33d8  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b396b46340d39  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b396b463cb40f  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b396c46460a8c  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b396d464fcbce  ONLINE       0     0     0
        scsi-3600605b0058c11a01c0b396f466e9d33  ONLINE       0     0     0

errors: No known data errors

(I tried to run scrub, but speed was very slow and I stopped it; after update from zfs-testing repo I didn't proceed zpool upgrade).

cpuinfo:

# cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 42
model name  : Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz
stepping    : 7
cpu MHz     : 2901.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5800.29
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 42
model name  : Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz
stepping    : 7
cpu MHz     : 2901.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 1
cpu cores   : 4
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5800.29
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 42
model name  : Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz
stepping    : 7
cpu MHz     : 2901.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 4
apicid      : 4
initial apicid  : 4
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5800.29
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 42
model name  : Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz
stepping    : 7
cpu MHz     : 2901.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 6
initial apicid  : 6
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5800.29
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

OpenVZ containers used with latest stable openvz kernel for centos 6.
Non-ECC 4 x 8GB Kingston memory banks used (32GB memory).
OS details:

# cat /etc/redhat-release 
CentOS release 6.6 (Final)

# uname -a
Linux backup3 2.6.32-042stab094.7 #1 SMP Wed Oct 22 12:43:21 MSK 2014 x86_64 x86_64 x86_64 GNU/Linux

# cat /var/log/dmesg | grep -E 'SPL:|ZFS:'
[    9.708040] SPL: Loaded module v0.6.3-52_g52479ec
[    9.862372] ZFS: Loaded module v0.6.3-159_gc944be5, ZFS pool version 5000, ZFS filesystem version 5
[   13.224374] SPL: using hostid 0x00000000

# for param in /sys/module/{spl,zfs}/parameters/*; do printf "%-50s" `basename $param`; cat $param; done
spl_hostid                                        0
spl_hostid_path                                   /etc/hostid
spl_kmem_cache_expire                             2
spl_kmem_cache_kmem_limit                         1024
spl_kmem_cache_max_size                           32
spl_kmem_cache_obj_per_slab                       16
spl_kmem_cache_obj_per_slab_min                   1
spl_kmem_cache_reclaim                            0
spl_kmem_cache_slab_limit                         16384
spl_taskq_thread_bind                             0
l2arc_feed_again                                  1
l2arc_feed_min_ms                                 200
l2arc_feed_secs                                   1
l2arc_headroom                                    2
l2arc_headroom_boost                              200
l2arc_nocompress                                  0
l2arc_noprefetch                                  1
l2arc_norw                                        0
l2arc_write_boost                                 8388608
l2arc_write_max                                   8388608
metaslab_bias_enabled                             1
metaslab_debug_load                               0
metaslab_debug_unload                             0
metaslab_fragmentation_factor_enabled             1
metaslab_lba_weighting_enabled                    1
metaslab_preload_enabled                          1
metaslabs_per_vdev                                200
spa_asize_inflation                               24
spa_config_path                                   /etc/zfs/zpool.cache
spa_load_verify_data                              1
spa_load_verify_maxinflight                       10000
spa_load_verify_metadata                          1
zfetch_array_rd_sz                                1048576
zfetch_block_cap                                  256
zfetch_max_streams                                8
zfetch_min_sec_reap                               2
zfs_arc_average_blocksize                         8192
zfs_arc_grow_retry                                5
zfs_arc_max                                       0
zfs_arc_memory_throttle_disable                   1
zfs_arc_meta_limit                                0
zfs_arc_meta_prune                                1048576
zfs_arc_min                                       0
zfs_arc_min_prefetch_lifespan                     1000
zfs_arc_p_aggressive_disable                      1
zfs_arc_p_dampener_disable                        1
zfs_arc_shrink_shift                              5
zfs_autoimport_disable                            0
zfs_dbuf_state_index                              0
zfs_deadman_enabled                               1
zfs_deadman_synctime_ms                           1000000
zfs_dedup_prefetch                                0
zfs_delay_min_dirty_percent                       60
zfs_delay_scale                                   500000
zfs_dirty_data_max                                3326791680
zfs_dirty_data_max_max                            8316979200
zfs_dirty_data_max_max_percent                    25
zfs_dirty_data_max_percent                        10
zfs_dirty_data_sync                               67108864
zfs_disable_dup_eviction                          0
zfs_expire_snapshot                               300
zfs_flags                                         0
zfs_free_leak_on_eio                              0
zfs_free_max_blocks                               100000
zfs_free_min_time_ms                              1000
zfs_immediate_write_sz                            32768
zfs_mdcomp_disable                                0
zfs_metaslab_fragmentation_threshold              70
zfs_mg_fragmentation_threshold                    85
zfs_mg_noalloc_threshold                          0
zfs_nocacheflush                                  0
zfs_nopwrite_enabled                              1
zfs_no_scrub_io                                   0
zfs_no_scrub_prefetch                             0
zfs_pd_blks_max                                   100
zfs_prefetch_disable                              0
zfs_read_chunk_size                               1048576
zfs_read_history                                  0
zfs_read_history_hits                             0
zfs_recover                                       0
zfs_resilver_delay                                2
zfs_resilver_min_time_ms                          3000
zfs_scan_idle                                     50
zfs_scan_min_time_ms                              1000
zfs_scrub_delay                                   4
zfs_send_corrupt_data                             0
zfs_sync_pass_deferred_free                       2
zfs_sync_pass_dont_compress                       5
zfs_sync_pass_rewrite                             2
zfs_top_maxinflight                               32
zfs_txg_history                                   0
zfs_txg_timeout                                   5
zfs_vdev_aggregation_limit                        131072
zfs_vdev_async_read_max_active                    3
zfs_vdev_async_read_min_active                    1
zfs_vdev_async_write_active_max_dirty_percent     60
zfs_vdev_async_write_active_min_dirty_percent     30
zfs_vdev_async_write_max_active                   10
zfs_vdev_async_write_min_active                   1
zfs_vdev_cache_bshift                             16
zfs_vdev_cache_max                                16384
zfs_vdev_cache_size                               0
zfs_vdev_max_active                               1000
zfs_vdev_mirror_switch_us                         10000
zfs_vdev_read_gap_limit                           32768
zfs_vdev_scheduler                                noop
zfs_vdev_scrub_max_active                         2
zfs_vdev_scrub_min_active                         1
zfs_vdev_sync_read_max_active                     10
zfs_vdev_sync_read_min_active                     10
zfs_vdev_sync_write_max_active                    10
zfs_vdev_sync_write_min_active                    10
zfs_vdev_write_gap_limit                          4096
zfs_zevent_cols                                   80
zfs_zevent_console                                0
zfs_zevent_len_max                                64
zil_replay_disable                                0
zil_slog_limit                                    1048576
zio_bulk_flags                                    0
zio_delay_max                                     30000
zio_injection_enabled                             0
zio_requeue_io_start_cut_in_line                  1
zvol_inhibit_dev                                  0
zvol_major                                        230
zvol_max_discard_blocks                           16384
zvol_threads                                      32

Additional details:

# zpool get all
NAME  PROPERTY                    VALUE                       SOURCE
data  size                        14.5T                       -
data  capacity                    39%                         -
data  altroot                     -                           default
data  health                      ONLINE                      -
data  guid                        12475046494474843601        default
data  version                     -                           default
data  bootfs                      -                           default
data  delegation                  on                          default
data  autoreplace                 off                         default
data  cachefile                   -                           default
data  failmode                    wait                        default
data  listsnapshots               off                         default
data  autoexpand                  off                         default
data  dedupditto                  0                           default
data  dedupratio                  1.66x                       -
data  free                        8.84T                       -
data  allocated                   5.66T                       -
data  readonly                    off                         -
data  ashift                      0                           default
data  comment                     -                           default
data  expandsize                  -                           -
data  freeing                     0                           default
data  fragmentation               -                           -
data  leaked                      0                           default
data  feature@async_destroy       enabled                     local
data  feature@empty_bpobj         active                      local
data  feature@lz4_compress        active                      local
data  feature@spacemap_histogram  disabled                    local
data  feature@enabled_txg         disabled                    local
data  feature@hole_birth          disabled                    local
data  feature@extensible_dataset  disabled                    local
data  feature@embedded_data       disabled                    local
data  feature@bookmarks           disabled                    local

# zfs get all
NAME                 PROPERTY              VALUE                     SOURCE
data                 type                  filesystem                -
data                 creation              Fri Nov 28 16:49 2014     -
data                 used                  7.03T                     -
data                 available             6.40T                     -
data                 referenced            62.8K                     -
data                 compressratio         1.26x                     -
data                 mounted               yes                       -
data                 quota                 none                      default
data                 reservation           none                      default
data                 recordsize            128K                      default
data                 mountpoint            /mnt/data                 local
data                 sharenfs              off                       default
data                 checksum              on                        default
data                 compression           off                       default
data                 atime                 on                        default
data                 devices               on                        default
data                 exec                  on                        default
data                 setuid                on                        default
data                 readonly              off                       default
data                 zoned                 off                       default
data                 snapdir               hidden                    default
data                 aclinherit            restricted                default
data                 canmount              on                        default
data                 xattr                 on                        default
data                 copies                1                         default
data                 version               5                         -
data                 utf8only              off                       -
data                 normalization         none                      -
data                 casesensitivity       sensitive                 -
data                 vscan                 off                       default
data                 nbmand                off                       default
data                 sharesmb              off                       default
data                 refquota              none                      default
data                 refreservation        none                      default
data                 primarycache          all                       default
data                 secondarycache        all                       default
data                 usedbysnapshots       0                         -
data                 usedbydataset         62.8K                     -
data                 usedbychildren        7.03T                     -
data                 usedbyrefreservation  0                         -
data                 logbias               latency                   default
data                 dedup                 off                       default
data                 mlslabel              none                      default
data                 sync                  standard                  default
data                 refcompressratio      1.00x                     -
data                 written               62.8K                     -
data                 logicalused           8.84T                     -
data                 logicalreferenced     15.5K                     -
data                 snapdev               hidden                    default
data                 acltype               off                       default
data                 context               none                      default
data                 fscontext             none                      default
data                 defcontext            none                      default
data                 rootcontext           none                      default
data                 relatime              off                       default
data                 redundant_metadata    all                       default
data                 overlay               off                       default
data/101_srv_backup  type                  filesystem                -
data/101_srv_backup  creation              Fri Nov 28 17:00 2014     -
data/101_srv_backup  used                  7.01T                     -
data/101_srv_backup  available             6.40T                     -
data/101_srv_backup  referenced            7.01T                     -
data/101_srv_backup  compressratio         1.26x                     -
data/101_srv_backup  mounted               yes                       -
data/101_srv_backup  quota                 none                      default
data/101_srv_backup  reservation           none                      default
data/101_srv_backup  recordsize            128K                      default
data/101_srv_backup  mountpoint            /mnt/data/101_srv_backup  inherited from data
data/101_srv_backup  sharenfs              off                       default
data/101_srv_backup  checksum              sha256                    local
data/101_srv_backup  compression           gzip-7                    local
data/101_srv_backup  atime                 off                       local
data/101_srv_backup  devices               on                        default
data/101_srv_backup  exec                  on                        default
data/101_srv_backup  setuid                on                        default
data/101_srv_backup  readonly              off                       default
data/101_srv_backup  zoned                 off                       default
data/101_srv_backup  snapdir               hidden                    default
data/101_srv_backup  aclinherit            restricted                default
data/101_srv_backup  canmount              on                        default
data/101_srv_backup  xattr                 sa                        local
data/101_srv_backup  copies                1                         default
data/101_srv_backup  version               5                         -
data/101_srv_backup  utf8only              off                       -
data/101_srv_backup  normalization         none                      -
data/101_srv_backup  casesensitivity       sensitive                 -
data/101_srv_backup  vscan                 off                       default
data/101_srv_backup  nbmand                off                       default
data/101_srv_backup  sharesmb              off                       default
data/101_srv_backup  refquota              none                      default
data/101_srv_backup  refreservation        none                      default
data/101_srv_backup  primarycache          all                       default
data/101_srv_backup  secondarycache        all                       default
data/101_srv_backup  usedbysnapshots       0                         -
data/101_srv_backup  usedbydataset         7.01T                     -
data/101_srv_backup  usedbychildren        0                         -
data/101_srv_backup  usedbyrefreservation  0                         -
data/101_srv_backup  logbias               latency                   default
data/101_srv_backup  dedup                 sha256,verify             local
data/101_srv_backup  mlslabel              none                      default
data/101_srv_backup  sync                  standard                  default
data/101_srv_backup  refcompressratio      1.26x                     -
data/101_srv_backup  written               7.01T                     -
data/101_srv_backup  logicalused           8.84T                     -
data/101_srv_backup  logicalreferenced     8.84T                     -
data/101_srv_backup  snapdev               hidden                    default
data/101_srv_backup  acltype               posixacl                  local
data/101_srv_backup  context               none                      default
data/101_srv_backup  fscontext             none                      default
data/101_srv_backup  defcontext            none                      default
data/101_srv_backup  rootcontext           none                      default
data/101_srv_backup  relatime              off                       default
data/101_srv_backup  redundant_metadata    all                       default
data/101_srv_backup  overlay               off                       default

behlendorf commented 9 years ago

This was very likely fixed by #2884. If possible could you try either.

Destroy and recreate the dataset then perform the initial rsync again using only the zfs-testing repository which contains the fix. Subsequent rsyns should be fine. Or,
Destroy and recreate the dataset then perform the initial rsync again but don't set xattr=sa on the dataset. You can use the stable zfs repository for this.

umask commented 9 years ago

Okay, I will try (1) and report you about result.

umask commented 9 years ago

zfs destroy data/101_srv_backup is very slow. I have had to cleanup my 8 drives and create zpool from scratch.

Is it normal - I waited 1 hour but space in parent pool freed less than 100GB?

And one more strange fact. Is it normal that scrub speed in my config approx. 10-12MB/sec on raidz2?

umask commented 9 years ago

I have tried (1). Using latest packages from zfs-testing repo (see versions in issue description) WITH xattr=sa and I get kernel panic again.

Kernel panic message is too long for paste here. Please see it in gist: https://gist.github.com/umask/62b247f37107053ff791

This kernel panic occurred at first rsync when 6-7TB was transferred.

umask commented 9 years ago

I have found messages in my logs

[25108.622770] sd 0:2:2:0: [sdc] megasas: RESET -7968430 cmd=28 retries=0
[25108.623345] megasas: [ 0]waiting for 1 commands to complete
[25109.623501] megaraid_sas: no pending cmds after reset
[25109.624075] megasas: reset successful

May be it's something connected with issue described in this ticket.

umask commented 9 years ago

2752 is resolved for me now.

umask commented 9 years ago

For clean experiment I erased disks, controller settings and created zpool from scratch. Now initial rsync is running again. Hope that no kernel panic will occur.

umask commented 9 years ago

Panic again while initial rsync: '[175802.578978] BUG: scheduling while atomic: rsync/2932/0x14010000'.

umask commented 9 years ago

While copying first 6-7TB of data all was OK. Next happened this warnings:

[165793.482413] INFO: task txg_sync:1247 blocked for more than 120 seconds.
[165793.482444]       Tainted: P           ---------------    2.6.32-042stab094.7 #1
[165793.482469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[165793.482494] txg_sync      D ffff880811150580     0  1247      2    0 0x00000000
[165793.482522]  ffff88080a629b90 0000000000000046 ffff88080a629af0 ffffffff8105fff2
[165793.482552]  ffff88080a629b40 ffffffff8100beae ffff88080a629b80 000000000000007d
[165793.482581]  0000000000000000 0000000000000001 ffff880811150b48 000000000001ecc0
[165793.482610] Call Trace:
[165793.482623]  [<ffffffff8105fff2>] ? default_wake_function+0x12/0x20
[165793.482645]  [<ffffffff8100beae>] ? call_function_interrupt+0xe/0x20
[165793.482668]  [<ffffffff8152fb53>] io_schedule+0x73/0xc0
[165793.482698]  [<ffffffffa027d74c>] cv_wait_common+0x8c/0x100 [spl]
[165793.482720]  [<ffffffff810a1b60>] ? autoremove_wake_function+0x0/0x40
[165793.482744]  [<ffffffffa027d7d8>] __cv_wait_io+0x18/0x20 [spl]
[165793.482788]  [<ffffffffa03d31db>] zio_wait+0xfb/0x1b0 [zfs]
[165793.482824]  [<ffffffffa035fd03>] dsl_pool_sync+0xb3/0x430 [zfs]
[165793.482862]  [<ffffffffa0374fab>] spa_sync+0x44b/0xb70 [zfs]
[165793.482881]  [<ffffffff81054939>] ? __wake_up_common+0x59/0x90
[165793.482901]  [<ffffffff810592c3>] ? __wake_up+0x53/0x70
[165793.482920]  [<ffffffff81015029>] ? read_tsc+0x9/0x20
[165793.482956]  [<ffffffffa0389365>] txg_sync_thread+0x355/0x5b0 [zfs]
[165793.482977]  [<ffffffff8106dc82>] ? enqueue_entity+0x52/0x280
[165793.483013]  [<ffffffffa0389010>] ? txg_sync_thread+0x0/0x5b0 [zfs]
[165793.483050]  [<ffffffffa0389010>] ? txg_sync_thread+0x0/0x5b0 [zfs]
[165793.483074]  [<ffffffffa0279228>] thread_generic_wrapper+0x68/0x80 [spl]
[165793.483098]  [<ffffffffa02791c0>] ? thread_generic_wrapper+0x0/0x80 [spl]
[165793.483120]  [<ffffffff810a1546>] kthread+0x96/0xa0
[165793.483137]  [<ffffffff8100c34a>] child_rip+0xa/0x20
[165793.483154]  [<ffffffff810a14b0>] ? kthread+0x0/0xa0
[165793.483170]  [<ffffffff8100c340>] ? child_rip+0x0/0x20
[166033.412997] INFO: task rsync:2934 blocked for more than 120 seconds.
[166033.413047]       Tainted: P           ---------------    2.6.32-042stab094.7 #1
[166033.413079] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[166033.413111] rsync         D ffff880811126f30     0  2934   2932  101 0x00000080
[166033.413138]  ffff880800fc1a98 0000000000000086 ffff8803a5e3db70 0000000000000282
[166033.413168]  ffff880800fc1a78 ffff8803a5e3db70 ffff8803a5e3db70 ffff880816926800
[166033.413211]  0000000000000000 ffff88081ac00000 ffff8808111274f8 000000000001ecc0
[166033.413240] Call Trace:
[166033.413255]  [<ffffffff810a1dae>] ? prepare_to_wait_exclusive+0x4e/0x80
[166033.413303]  [<ffffffffa027d7ad>] cv_wait_common+0xed/0x100 [spl]
[166033.413324]  [<ffffffff810a1b60>] ? autoremove_wake_function+0x0/0x40
[166033.413354]  [<ffffffffa027d815>] __cv_wait+0x15/0x20 [spl]
[166033.413401]  [<ffffffffa0349e8d>] dmu_tx_wait+0xad/0x340 [zfs]
[166033.413434]  [<ffffffffa034a301>] dmu_tx_assign+0x91/0x490 [zfs]
[166033.414064]  [<ffffffffa03c3a9b>] zfs_write+0x40b/0xc50 [zfs]
[166033.414687]  [<ffffffff811ac40a>] ? do_sync_read+0xfa/0x140
[166033.415311]  [<ffffffffa03d8b85>] zpl_write+0xa5/0x140 [zfs]
[166033.415914]  [<ffffffff811ac5a8>] vfs_write+0xb8/0x1a0
[166033.416512]  [<ffffffff811acf71>] sys_write+0x51/0x90
[166033.417087]  [<ffffffff810f4cee>] ? __audit_syscall_exit+0x25e/0x290
[166033.417657]  [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
[167233.071730] INFO: task rsync:2934 blocked for more than 120 seconds.
[167233.072310]       Tainted: P           ---------------    2.6.32-042stab094.7 #1
[167233.072897] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[167233.073483] rsync         D ffff880811126f30     0  2934   2932  101 0x00000080
[167233.074082]  ffff880800fc1a98 0000000000000086 0000000000000000 0000000000000282
[167233.074670]  ffff880800fc1a78 ffff88080b92f310 ffff88080b92f310 ffff880816926800
[167233.075266]  0000000000000000 0000000109f0beb8 ffff8808111274f8 000000000001ecc0
[167233.075869] Call Trace:
[167233.076469]  [<ffffffffa027d7ad>] cv_wait_common+0xed/0x100 [spl]
[167233.077070]  [<ffffffff810a1b60>] ? autoremove_wake_function+0x0/0x40
[167233.077671]  [<ffffffffa027d815>] __cv_wait+0x15/0x20 [spl]
[167233.078295]  [<ffffffffa0349e8d>] dmu_tx_wait+0xad/0x340 [zfs]
[167233.078911]  [<ffffffffa034a301>] dmu_tx_assign+0x91/0x490 [zfs]
[167233.079533]  [<ffffffffa0357ffb>] ? dsl_dataset_block_freeable+0x4b/0x80 [zfs]
[167233.080155]  [<ffffffffa0349b06>] ? dmu_tx_count_dnode+0x66/0xd0 [zfs]
[167233.080788]  [<ffffffffa03c3a9b>] zfs_write+0x40b/0xc50 [zfs]
[167233.081420]  [<ffffffffa03d8b85>] zpl_write+0xa5/0x140 [zfs]
[167233.082032]  [<ffffffff811ac5a8>] vfs_write+0xb8/0x1a0
[167233.082636]  [<ffffffff811acf71>] sys_write+0x51/0x90
[167233.083246]  [<ffffffff810f4cee>] ? __audit_syscall_exit+0x25e/0x290
[167233.083866]  [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b

Probably this messages something connected with kernel panic which occurs after.

There is messages about kernel panic (too big to paste here):

https://gist.github.com/umask/04651d14729c16331c6e

(problem is the same as here https://gist.github.com/umask/62b247f37107053ff791)

umask commented 9 years ago

@behlendorf , could you give me some advice? May be I need do not to use xattr=sa?

I need to store backups on this server and if I do not make ZoL workable I will have to use ext3/4 with rdiff-backup (it's very slow...) :(

umask commented 9 years ago

I'm trying to locate file/directory which rsync copying when kernel panic occurs...

umask commented 9 years ago

Panic occurs on different files (random files in one directory?).

umask commented 9 years ago

I have erased disks, controller settings and created zpool from scratch... again.

Now initial rsync is running.

xattr=on (BY-DEFAULT) now.

(zfs packages still from testing repo)

dweeezil commented 9 years ago

@umask Here's a few notes. The original panic posted in this issue which had the following stack trace:

[ 8859.308567]  [<ffffffffa03fb833>] zio_buf_alloc+0x23/0x30 [zfs]
[ 8859.309389]  [<ffffffffa035dcb8>] arc_get_data_buf+0x498/0x4d0 [zfs]
[ 8859.310212]  [<ffffffffa035e540>] arc_buf_alloc+0xf0/0x130 [zfs]
...

is most certainly caused by a corrupted dnode in which arbitrary data are used as a blkptr for a spill block.

Second, this type of corruption can definitely lead to filesystems which can't be destroyed via zfs destroy.

Finally, the 4254acb patch should fix all cases of corrupted dnodes I'm aware of but it will not do anything for a filesystem which has already been corrupted. In fact, no fix commited to master (other than maybe 5f6d0b6f) will be of much assistance to an already-corrupted filesystem.

If you are able to create a filesystem with a corrupted dnode using a module containing 4254acb, I'd sure like to know about it. And, if so, please try to track down one of the corrupted files and/or directories and post the output of zdb -ddddd <pool>/<fs> <inode> of it.

umask commented 9 years ago

@dweeezil how I can determine corrupted file(s) and/or directory(ies)?

(I suspect one directory with 35GB of many subdirectories and files and I hope that problem occurs on it; otherwise I have to wait until ~7TB of data will by rsynced).

umask commented 9 years ago

If I gets kernel crash dump using KDump - will it be enough to identify problem?

dweeezil commented 9 years ago

@umask The easiest way to identify the file with a bad dnode is to try to run a stat(2) on each file (typically by running a simple script). You can save time by trying directories first since they're most likely to become corrupted. Once you find a corrupted file and/or directory, start by running zdb -dddddd <pool>/<fs> <inode> (where is the affected file/dir's inode number) and that will give is a basic idea of the problem and where to go further.

I'd like to clarify how you're running into this problem: Are you creating a brand new ZFS filesystem with a module as of at least 4254acb (including all previous SA-related fixes) and then populating it with rsync and getting the corruption? Or is this an existing filesystem which was populated prior to all the SA fixes?

umask commented 9 years ago

There is details how my zpool created and how I get kernel panic:

/* zfs/spl packages installed from zfs-testing repo */
# cat /var/log/dmesg | grep -E 'SPL:|ZFS:'
[    9.150142] SPL: Loaded module v0.6.3-52_g52479ec
[    9.306503] ZFS: Loaded module v0.6.3-159_gc944be5, ZFS pool version 5000, ZFS filesystem version 5
[   10.651813] SPL: using hostid 0x00000000
# zpool create data raidz2 scsi-3600605b0058c11a01c176f05140bece1 scsi-3600605b0058c11a01c176f061412559a scsi-3600605b0058c11a01c176f061419ff77 scsi-3600605b0058c11a01c176f071421f0b0 scsi-3600605b0058c11a01c176f07142bc148 scsi-3600605b0058c11a01c176f0814354c3e scsi-3600605b0058c11a01c176f09143fadf8 scsi-3600605b0058c11a01c176f09144a8ef1 -m /mnt/data -f
# zfs create data/101_srv_backup
# zfs set checksum=sha256 data/101_srv_backup
# zfs set compression=gzip-7 data/101_srv_backup
# zfs set dedup=sha256,verify data/101_srv_backup
# zfs set acltype=posixacl data/101_srv_backup
# zfs set xattr=sa data/101_srv_backup
# zfs set atime=off data/101_srv_backup
/* rsync here runs in openvz container */
# rsync -aHAX --numeric-ids --relative --timeout=1500 --delete --delete-excluded --exclude-from=/root/rsync_exclude.txt root@server:/mnt/data /srv/backup/server/ --progress --stats -v
/* kernel panic occurs after few days of rsyncing when 6-7TB of data transferred */

i.e. ZFS filesystem created by latest available version of ZoL with all available SA-related fixes.

@dweeezil what about https://gist.github.com/umask/04651d14729c16331c6e https://gist.github.com/umask/62b247f37107053ff791 ?

Currently I'm running rsync with xattr=on (BY-DEFAULT) to check that no kernel panic will happen. Unfortunately I stopped my previous the same test to check that transferring 35GB of suspected directory will produce panic. But panic didn't happen on this 35GB

I have setup kdump. Will its help if panic occurs?

dweeezil commented 9 years ago

@umask Neither of those sets of stack traces from the gists look to be SA/spill related (but they're very long and I've not looked too thoroughly at them yet). What kind of preemption is your kernel using? Settings other than CONFIG_PREEMPT_VOLUNTARY=y have been known to cause problems.

umask commented 9 years ago

This kernel .config parameters are standard for vzkernel (OpenVZ kernel for rhel6/centos6) and kernel from rhel6/centos6 vendors:

# cat /boot/config-2.6.32-042stab094.7  | grep PREEM
# CONFIG_TREE_PREEMPT_RCU is not set
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

Neither of those sets of stack traces from the gists look to be SA/spill related (but they're very long and I've not looked too thoroughly at them yet)

Both gists I got from two latest kernel panics. So, I'm waiting for my current test results. I will write about result here, in this ticket.

umask commented 9 years ago

So, in result rsync completed successfully

...
2014/12/15 09:34:58 [535] Number of files: 14741651
2014/12/15 09:34:58 [535] Number of files transferred: 13965296
2014/12/15 09:34:58 [535] Total file size: 9697844213610 bytes
2014/12/15 09:34:58 [535] Total transferred file size: 9697844207468 bytes
2014/12/15 09:34:58 [535] Literal data: 9697849062964 bytes
2014/12/15 09:34:58 [535] Matched data: 0 bytes
2014/12/15 09:34:58 [535] File list size: 480287234
2014/12/15 09:34:58 [535] File list generation time: 0.008 seconds
2014/12/15 09:34:58 [535] File list transfer time: 0.000 seconds
2014/12/15 09:34:58 [535] Total bytes sent: 268448716
2014/12/15 09:34:58 [535] Total bytes received: 9700104109660
2014/12/15 09:34:58 [535] sent 268448716 bytes  received 9700104109660 bytes  39090998.17 bytes/sec
2014/12/15 09:34:58 [535] total size is 9697844213610  speedup is 1.00
2014/12/15 09:34:58 [533] rsync warning: some files vanished before they could be transferred (code 24) at main.c(1505) [generator=3.0.6]

with xattr=on (BY-DEFAULT).

Before this success I have tried xattr=sa with kernel panics every time.

umask commented 9 years ago

I have run rsync again on my dataset and no kernel panics occurs anymore.

I convinced that problem in xattr=sa.

Unfortunately I have no possibility to reproduce problem on this server because I need consistent backups.

I'm setting up new one for tests.

dweeezil commented 9 years ago

@umask If there is still a lingering issue with SA xattrs and the manner in which they're typically used when acltype=posixacl, I'd sure love to see the zdb -ddddd <pool>/<fs> <inode> output from an affected file. If you can reproduce the problem, please try to isolate an affected file by traversing the filesystem with an ls -lR and then trying ls -ld on individual files and/or directories as you get close. Once you find one, get its inode number and run the zdb command.

behlendorf commented 9 years ago

Closing, all known SA fixes have been merged.

openzfs / zfs

kernel panic on v0.6.3-159_gc944be5 #2945

2752 is resolved for me now.