I have a file server, that goes kernel panic once a week, during zfs scrub.
HW specifics is not very important, so, just the most important:
Intel Xeon E3-1220 v3 @ 3.10GHz (4 cores)
16 gb DDR-3
LSI SAS SAS2116 PCI-Express Fusion-MPT + Intel Expander
Intel 82599EB 10-Gigabit SFP+
48 HGST Ultrastar SAS.
Debian 7 64,
Linux 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt25-2~bpo70+1 (2016-04-12) x86_64
ZFS module v0.6.5.7-8-wheezy, ZFS pool version 5000, ZFS filesystem version 5
SPL module v0.6.5.7-2-wheezy
zpool status output:
`
pool: tank
state: ONLINE
scan: scrub repaired 0 in 25h31m with 0 errors on Mon Jul 11 02:16:39 2016
config:
And then - reboot.
Now, I have slightly modified /etc/modprobe.d/zfs.conf, to the following numbers:
options zfs zfs_arc_min=8589934592 zfs_arc_max=12884901888 zfs_prefetch_disable=1 zfs_txg_timeout=5
Is that the source of a problem? I'm changing zfs_arc_min to 1 gb now, but still, is this normal?
UPD Sorry for the mess, still figuring out how to format code properly.
Do you have zfs-auto-snapshot installed? How many snapshots do you have, I ended up disabling part of it (snapshots every 15 min) or uninstalling it entirely and rolling my own...
Hi.
I have a file server, that goes kernel panic once a week, during zfs scrub.
HW specifics is not very important, so, just the most important: Intel Xeon E3-1220 v3 @ 3.10GHz (4 cores) 16 gb DDR-3 LSI SAS SAS2116 PCI-Express Fusion-MPT + Intel Expander Intel 82599EB 10-Gigabit SFP+ 48 HGST Ultrastar SAS.
Debian 7 64, Linux 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt25-2~bpo70+1 (2016-04-12) x86_64 ZFS module v0.6.5.7-8-wheezy, ZFS pool version 5000, ZFS filesystem version 5 SPL module v0.6.5.7-2-wheezy
zpool status output:
` pool: tank state: ONLINE scan: scrub repaired 0 in 25h31m with 0 errors on Mon Jul 11 02:16:39 2016 config:
errors: No known data errors
pool: tank-2 state: ONLINE scan: scrub repaired 0 in 33h28m with 0 errors on Sun Jul 10 09:47:15 2016 config:
errors: No known data errors`
Now, the problem is that generally every time we scrub pools, system reboots.
I see a few Out of memory messages like these:
Jul 9 11:32:51 dime-1511 kernel: [600298.594652] zfs-auto-snapsh invoked oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=0 Jul 9 11:32:51 dime-1511 kernel: [600298.594736] zfs-auto-snapsh cpuset=/ mems_allowed=0 Jul 9 11:32:51 dime-1511 kernel: [600298.594787] CPU: 3 PID: 1957 Comm: zfs-auto-snapsh Tainted: P O 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt25-2~bpo70+1 Jul 9 11:32:51 dime-1511 kernel: [600298.594879] Hardware name: Intel Corporation S1200RP_SE/S1200RP_SE, BIOS S1200RP.86B.03.01.0002.041520151123 04/15/2015 Jul 9 11:32:51 dime-1511 kernel: [600298.594969] 0000000000000286 0000000000000000 ffffffff81548b0d 0000000000000007 Jul 9 11:32:51 dime-1511 kernel: [600298.595057] ffff880439318050 0000000000000000 ffffffff81545fa7 0000000000000000 Jul 9 11:32:51 dime-1511 kernel: [600298.595145] 0000000000000000 0000000000000003 ffffffff810cd2a0 00000000ffffffff Jul 9 11:32:51 dime-1511 kernel: [600298.595233] Call Trace: Jul 9 11:32:51 dime-1511 kernel: [600298.595274] [] ? dump_stack+0x5e/0x7a
Jul 9 11:32:51 dime-1511 kernel: [600298.595322] [] ? dump_header+0x76/0x1ec
Jul 9 11:32:51 dime-1511 kernel: [600298.595370] [] ? rcu_batches_completed+0x10/0x10
Jul 9 11:32:51 dime-1511 kernel: [600298.595421] [] ? smp_call_function_single+0x5f/0xa0
Jul 9 11:32:51 dime-1511 kernel: [600298.595472] [] ? mutex_lock+0xe/0x2a
Jul 9 11:32:51 dime-1511 kernel: [600298.595520] [] ? put_online_cpus+0x27/0x90
Jul 9 11:32:51 dime-1511 kernel: [600298.595569] [] ? rcu_oom_notify+0xcc/0xe0
Jul 9 11:32:51 dime-1511 kernel: [600298.595619] [] ? oom_kill_process+0x28a/0x3e0
Jul 9 11:32:51 dime-1511 kernel: [600298.595668] [] ? find_lock_task_mm+0x4c/0xa0
Jul 9 11:32:51 dime-1511 kernel: [600298.595717] [] ? has_ns_capability_noaudit+0x15/0x20
Jul 9 11:32:51 dime-1511 kernel: [600298.595768] [] ? out_of_memory+0x404/0x550
Jul 9 11:32:51 dime-1511 kernel: [600298.595817] [] ? __alloc_pages_nodemask+0xa59/0xbb0
Jul 9 11:32:51 dime-1511 kernel: [600298.595868] [] ? copy_process+0x208/0x1c00
Jul 9 11:32:51 dime-1511 kernel: [600298.595917] [] ? __do_page_fault+0x29a/0x540
Jul 9 11:32:51 dime-1511 kernel: [600298.595967] [] ? get_empty_filp+0xc3/0x1c0
Jul 9 11:32:51 dime-1511 kernel: [600298.596015] [] ? do_fork+0x72/0x340
Jul 9 11:32:51 dime-1511 kernel: [600298.596062] [] ? stub_clone+0x69/0x90
Jul 9 11:32:51 dime-1511 kernel: [600298.596109] [] ? system_call_fast_compare_end+0x10/0x15
Jul 9 11:32:51 dime-1511 kernel: [600298.596160] Mem-Info:
Jul 9 11:32:51 dime-1511 kernel: [600298.596196] Node 0 DMA per-cpu:
Jul 9 11:32:51 dime-1511 kernel: [600298.596237] CPU 0: hi: 0, btch: 1 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596281] CPU 1: hi: 0, btch: 1 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596325] CPU 2: hi: 0, btch: 1 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596369] CPU 3: hi: 0, btch: 1 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596413] Node 0 DMA32 per-cpu:
Jul 9 11:32:51 dime-1511 kernel: [600298.596455] CPU 0: hi: 186, btch: 31 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596499] CPU 1: hi: 186, btch: 31 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596542] CPU 2: hi: 186, btch: 31 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596586] CPU 3: hi: 186, btch: 31 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596630] Node 0 Normal per-cpu:
Jul 9 11:32:51 dime-1511 kernel: [600298.596671] CPU 0: hi: 186, btch: 31 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596715] CPU 1: hi: 186, btch: 31 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596759] CPU 2: hi: 186, btch: 31 usd: 0
Jul 9 11:32:51 dime-1511 kernel: [600298.596803] CPU 3: hi: 186, btch: 31 usd: 25
Jul 9 11:32:51 dime-1511 kernel: [600298.596848] active_anon:610 inactive_anon:277 isolated_anon:0
Jul 9 11:32:51 dime-1511 kernel: [600298.596848] active_file:94 inactive_file:77 isolated_file:0
Jul 9 11:32:51 dime-1511 kernel: [600298.596848] unevictable:527 dirty:18 writeback:23 unstable:0
Jul 9 11:32:51 dime-1511 kernel: [600298.596848] free:661655 slab_reclaimable:5160 slab_unreclaimable:314982
Jul 9 11:32:51 dime-1511 kernel: [600298.596848] mapped:640 shmem:2 pagetables:1690 bounce:0
Jul 9 11:32:51 dime-1511 kernel: [600298.596848] free_cma:0
Jul 9 11:32:51 dime-1511 kernel: [600298.597105] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolate
d(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0
kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jul 9 11:32:51 dime-1511 kernel: [600298.597384] lowmem_reserve[]: 0 2662 16002 16002
Jul 9 11:32:51 dime-1511 kernel: [600298.597440] Node 0 DMA32 free:481468kB min:11232kB low:14040kB high:16848kB active_anon:752kB inactive_anon:292kB active_file:0kB inactive_file:152kB unevi
ctable:0kB isolated(anon):0kB isolated(file):0kB present:2803364kB managed:2728676kB mlocked:0kB dirty:0kB writeback:4kB mapped:128kB shmem:4kB slab_reclaimable:2352kB slab_unreclaimable:210160
kB kernel_stack:3792kB pagetables:432kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1961 all_unreclaimable? yes
Jul 9 11:32:51 dime-1511 kernel: [600298.597757] lowmem_reserve[]: 0 0 13339 13339
Jul 9 11:32:51 dime-1511 kernel: [600298.597813] Node 0 Normal free:2149260kB min:56280kB low:70348kB high:84420kB active_anon:1688kB inactive_anon:816kB active_file:376kB inactive_file:156kB
unevictable:2108kB isolated(anon):0kB isolated(file):0kB present:13893632kB managed:13659712kB mlocked:2108kB dirty:72kB writeback:88kB mapped:2432kB shmem:4kB slab_reclaimable:18288kB slab_unr
eclaimable:1049768kB kernel_stack:12528kB pagetables:6328kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4623 all_unreclaimable? yes
Jul 9 11:32:51 dime-1511 kernel: [600298.598133] lowmem_reserve[]: 0 0 0 0
Jul 9 11:32:51 dime-1511 kernel: [600298.598188] Node 0 DMA: 1_4kB (U) 0_8kB 1_16kB (U) 0_32kB 2_64kB (U) 1_128kB (U) 1_256kB (U) 0_512kB 1_1024kB (U) 1_2048kB (R) 3_4096kB (M) = 15892kB
Jul 9 11:32:51 dime-1511 kernel: [600298.598336] Node 0 DMA32: 54471_4kB (UEM) 32650_8kB (UEM) 83_16kB (UEMR) 1_32kB (R) 0_64kB 0_128kB 0_256kB 0_512kB 1_1024kB (R) 0_2048kB 0_4096kB = 481468k
B
Jul 9 11:32:51 dime-1511 kernel: [600298.598478] Node 0 Normal: 248417_4kB (UEM) 143397_8kB (UEM) 263_16kB (UEM) 6_32kB (UM) 3_64kB (U) 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 1*4096kB (R) =
2149532kB
Jul 9 11:32:51 dime-1511 kernel: [600298.606236] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 9 11:32:51 dime-1511 kernel: [600298.606318] 915 total pagecache pages
Jul 9 11:32:51 dime-1511 kernel: [600298.606359] 252 pages in swap cache
Jul 9 11:32:51 dime-1511 kernel: [600298.606399] Swap cache stats: add 64573, delete 64321, find 8870418/8899641
Jul 9 11:32:51 dime-1511 kernel: [600298.606450] Free swap = 2541488kB
Jul 9 11:32:51 dime-1511 kernel: [600298.606489] Total swap = 2579452kB
Jul 9 11:32:51 dime-1511 kernel: [600298.606529] 4178243 pages RAM
Jul 9 11:32:51 dime-1511 kernel: [600298.606567] 0 pages HighMem/MovableOnly
Jul 9 11:32:51 dime-1511 kernel: [600298.606608] 58480 pages reserved
Jul 9 11:32:51 dime-1511 kernel: [600298.606647] 0 pages hwpoisoned
Jul 9 11:32:51 dime-1511 kernel: [600298.606685] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jul 9 11:32:51 dime-1511 kernel: [600298.606768] [ 1213] 0 1213 5383 338 16 186 -1000 udevd
Jul 9 11:32:51 dime-1511 kernel: [600298.606865] [ 7588] 0 7588 5347 269 15 150 -1000 udevd
Jul 9 11:32:51 dime-1511 kernel: [600298.606945] [ 7589] 0 7589 5347 269 15 150 -1000 udevd
Jul 9 11:32:51 dime-1511 kernel: [600298.607025] [ 8116] 0 8116 4747 348 15 66 0 rpcbind
Jul 9 11:32:51 dime-1511 kernel: [600298.607106] [ 8134] 103 8134 5840 366 17 114 0 rpc.statd
Jul 9 11:32:51 dime-1511 kernel: [600298.607188] [ 8148] 0 8148 6328 0 17 57 0 rpc.idmapd
Jul 9 11:32:51 dime-1511 kernel: [600298.607269] [ 8378] 0 8378 30927 149 24 739 0 rsyslogd
Jul 9 11:32:51 dime-1511 kernel: [600298.607350] [ 8386] 0 8386 8222 316 18 73 0 zed
Jul 9 11:32:51 dime-1511 kernel: [600298.607430] [ 8477] 0 8477 17459 22 38 138 0 nmbd
Jul 9 11:32:51 dime-1511 kernel: [600298.607510] [ 8524] 0 8524 4172 0 11 40 0 atd
Jul 9 11:32:51 dime-1511 kernel: [600298.607590] [ 8551] 0 8551 3251 240 11 35 0 mdadm
Jul 9 11:32:51 dime-1511 kernel: [600298.607670] [ 8625] 0 8625 5393 299 16 73 0 cron
Jul 9 11:32:51 dime-1511 kernel: [600298.607750] [ 8648] 0 8648 5107 357 14 30 0 irqbalance
Jul 9 11:32:51 dime-1511 kernel: [600298.607832] [ 8652] 0 8652 84804 274 50 272 0 rrdcached
Jul 9 11:32:51 dime-1511 kernel: [600298.607913] [ 8659] 0 8659 9899 368 24 87 0 cnid_metad
Jul 9 11:32:51 dime-1511 kernel: [600298.607994] [ 8702] 0 8702 14288 186 31 147 0 afpd
Jul 9 11:32:51 dime-1511 kernel: [600298.608074] [ 8718] 107 8718 10803 204 26 137 0 ntpd
Jul 9 11:32:51 dime-1511 kernel: [600298.608154] [ 8747] 0 8747 1033 327 8 36 0 acpid
Jul 9 11:32:51 dime-1511 kernel: [600298.608235] [ 8769] 105 8769 7487 343 19 114 0 dbus-daemon
Jul 9 11:32:51 dime-1511 kernel: [600298.608317] [ 8777] 0 8777 21710 0 42 733 0 php5-fpm
Jul 9 11:32:51 dime-1511 kernel: [600298.608398] [ 8779] 1001 8779 21678 158 41 728 0 php5-fpm
Jul 9 11:32:51 dime-1511 kernel: [600298.608479] [ 8780] 1001 8780 21678 158 41 728 0 php5-fpm
Jul 9 11:32:51 dime-1511 kernel: [600298.608560] [ 8788] 0 8788 25162 86 51 242 0 smbd
Jul 9 11:32:51 dime-1511 kernel: [600298.608640] [ 8825] 0 8825 19607 82 38 284 0 nginx
Jul 9 11:32:51 dime-1511 kernel: [600298.608720] [ 8826] 33 8826 19712 98 39 358 0 nginx
Jul 9 11:32:51 dime-1511 kernel: [600298.608801] [ 8827] 33 8827 19789 308 39 409 0 nginx
Jul 9 11:32:51 dime-1511 kernel: [600298.608881] [ 8828] 33 8828 19789 340 39 427 0 nginx
Jul 9 11:32:51 dime-1511 kernel: [600298.608961] [ 8829] 33 8829 19789 335 39 419 0 nginx
Jul 9 11:32:51 dime-1511 kernel: [600298.609042] [ 8834] 0 8834 25162 225 48 256 0 smbd
Jul 9 11:32:51 dime-1511 kernel: [600298.609122] [ 8891] 108 8891 8643 306 22 128 0 avahi-daemon
Jul 9 11:32:51 dime-1511 kernel: [600298.609204] [ 8894] 108 8894 8513 0 21 64 0 avahi-daemon
Jul 9 11:32:51 dime-1511 kernel: [600298.609287] [ 8933] 0 8933 88963 199 67 154 0 collectd
Jul 9 11:32:51 dime-1511 kernel: [600298.609368] [ 8945] 0 8945 12489 368 27 152 -1000 sshd
Jul 9 11:32:51 dime-1511 kernel: [600298.609448] [ 8984] 0 8984 4964 322 14 356 0 smartd
Jul 9 11:32:51 dime-1511 kernel: [600298.609528] [ 9019] 0 9019 10262 106 21 237 0 monit
Jul 9 11:32:51 dime-1511 kernel: [600298.609610] [ 9053] 0 9053 1042 528 8 0 -1000 watchdog
Jul 9 11:32:51 dime-1511 kernel: [600298.609691] [ 9066] 0 9066 4355 351 14 40 0 getty
Jul 9 11:32:51 dime-1511 kernel: [600298.609771] [ 9067] 0 9067 4355 345 14 39 0 getty
Jul 9 11:32:51 dime-1511 kernel: [600298.609852] [ 9068] 0 9068 4355 353 13 41 0 getty
Jul 9 11:32:51 dime-1511 kernel: [600298.609932] [ 9069] 0 9069 4355 355 14 40 0 getty
Jul 9 11:32:51 dime-1511 kernel: [600298.610013] [ 9070] 0 9070 4355 357 14 39 0 getty
Jul 9 11:32:51 dime-1511 kernel: [600298.610093] [ 9071] 0 9071 4355 357 14 40 0 getty
Jul 9 11:32:51 dime-1511 kernel: [600298.610174] [ 9095] 0 9095 48277 0 30 385 0 console-kit-dae
Jul 9 11:32:51 dime-1511 kernel: [600298.610256] [ 9162] 0 9162 30733 0 29 187 0 polkitd
Jul 9 11:32:51 dime-1511 kernel: [600298.610337] [ 9185] 1004 9185 16198 328 34 232 0 cnid_dbd
Jul 9 11:32:51 dime-1511 kernel: [600298.610418] [ 9188] 1004 9188 16164 107 36 176 0 cnid_dbd
Jul 9 11:32:51 dime-1511 kernel: [600298.610499] [15566] 1004 15566 16200 219 32 232 0 cnid_dbd
Jul 9 11:32:51 dime-1511 kernel: [600298.610580] [15584] 1004 15584 16207 146 31 237 0 cnid_dbd
Jul 9 11:32:51 dime-1511 kernel: [600298.610661] [15629] 0 15629 9473 62 23 122 0 master
Jul 9 11:32:51 dime-1511 kernel: [600298.610741] [15631] 104 15631 10037 34 23 151 0 qmgr
Jul 9 11:32:51 dime-1511 kernel: [600298.610822] [ 1877] 104 1877 9990 270 23 134 0 pickup
Jul 9 11:32:51 dime-1511 kernel: [600298.610902] [ 1886] 104 1886 9993 293 24 120 0 trivial-rewrite
Jul 9 11:32:51 dime-1511 kernel: [600298.610985] [ 1904] 0 1904 21508 13 45 165 0 afpd
Jul 9 11:32:51 dime-1511 kernel: [600298.611065] [ 1917] 0 1917 21636 379 45 147 0 afpd
Jul 9 11:32:51 dime-1511 kernel: [600298.611145] [ 1926] 0 1926 21686 360 45 133 0 afpd
Jul 9 11:32:51 dime-1511 kernel: [600298.611226] [ 1941] 104 1941 11149 69 27 179 0 smtp
Jul 9 11:32:51 dime-1511 kernel: [600298.611306] [ 1950] 104 1950 9998 163 25 116 0 bounce
Jul 9 11:32:51 dime-1511 kernel: [600298.611386] [ 1951] 0 1951 11232 329 28 95 0 cron
Jul 9 11:32:51 dime-1511 kernel: [600298.611467] [ 1952] 0 1952 11232 328 28 95 0 cron
Jul 9 11:32:51 dime-1511 kernel: [600298.611547] [ 1953] 0 1953 1049 75 7 24 0 sh
Jul 9 11:32:51 dime-1511 kernel: [600298.611626] [ 1954] 0 1954 1049 79 8 23 0 sh
Jul 9 11:32:51 dime-1511 kernel: [600298.611705] [ 1956] 0 1956 1049 285 8 41 0 omv-mkgraph
Jul 9 11:32:51 dime-1511 kernel: [600298.611787] [ 1957] 0 1957 1049 296 7 43 0 zfs-auto-snapsh
Jul 9 11:32:51 dime-1511 kernel: [600298.611870] [ 1974] 0 1974 1049 302 8 27 0 openmediavault-
Jul 9 11:32:51 dime-1511 kernel: [600298.611953] [ 1980] 0 1980 22599 239 48 235 0 rrdtool
Jul 9 11:32:51 dime-1511 kernel: [600298.612034] Out of memory: Kill process 8779 (php5-fpm) score 0 or sacrifice child
Jul 9 11:32:51 dime-1511 kernel: [600298.612112] Killed process 8779 (php5-fpm) total-vm:86712kB, anon-rss:0kB, file-rss:632kB
And then - reboot. Now, I have slightly modified /etc/modprobe.d/zfs.conf, to the following numbers:
options zfs zfs_arc_min=8589934592 zfs_arc_max=12884901888 zfs_prefetch_disable=1 zfs_txg_timeout=5
Is that the source of a problem? I'm changing zfs_arc_min to 1 gb now, but still, is this normal?
UPD Sorry for the mess, still figuring out how to format code properly.