openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

Kernel panic after zfs 6.4 upgrade Centos 6.6 - 2 servers #3376

Closed solarjdp69 closed 9 years ago

solarjdp69 commented 9 years ago
We have two (2) Supermicro X9SCM w/ Xeon 3.1ghz CPU systems: 

1 with 16 gb RAM. 10 Sata drives connected via Highpoint RocketRaid 2720sgl controllers in a Norco RPC-4224 24 drive case.

The second is in a generic server case with 32GB ram, 10 Sata drives connected to a combination of the 6 motherboard Sata ports and a 4 port PCI-E generic SATA SIG chipset controller .

We have recently had both servers panic since the upgrade to zfs 6.4 with these common messages displayed on the consoles:

zfs_sb_prune  arc_adapt thread

A hard reset is required to re-start each server.  No keyboard action including three fingered salute is effective.  Both have been solid up until the zfs 6.4 upgrade.

uname -a
Linux server-name 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

16gb server:
dmesg | grep -E 'SPL:|ZFS:'
SPL: Loaded module v0.6.4-1
ZFS: Loaded module v0.6.4-1, ZFS pool version 5000, ZFS filesystem version 5
SPL: using hostid 0x00000000

32gb server:

dmesg | grep -E 'SPL:|ZFS:'
SPL: Loaded module v0.6.4-1
ZFS: Loaded module v0.6.4-1, ZFS pool version 5000, ZFS filesystem version 5
SPL: using hostid 0x00000000

Below message from the console from the 32gb server - locked up solid
==================================================================

 [<ffffffff8152e114>] ? oops_end+0xe4/0x100
 [<ffffffff8104c80b>] ? no_context+0xfb/0x260
 [<ffffffff8104ca95>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff8104cb63>] ? bad_area_nosemaphore+0x13/0x20
 [<ffffffff8104d2bf>] ? __do_page_fault+0x31f/0x480
 [<ffffffffa0237cb2>] ? arc_hdr_destroy+0x252/0x310 [zfs]
 [<ffffffffa023bfb9>] ? arc_evict_ghost+0x2f9/0x6a0 [zfs]
 [<ffffffff8153003e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152d3f5>] ? page_fault+0x25/0x30
 [<ffffffff811a837a>] ? shrink_dcache_parent+0x1a/0x170
 [<ffffffff8152aede>] ? mutex_lock+0x1e/0x50
 [<ffffffffa02ce890>] ? zfs_sb_prune+0x90/0xb0 [zfs]
 [<ffffffffa02ef6e0>] ? zpl_prune_sb+0x0/0x30 [zfs]
 [<ffffffffa02ef706>] ? zpl_prune_sb+0x26/0x30 [zfs]
 [<ffffffffa023d024>] ? arc_adapt_thread+0x254/0x4e0 [zfs]
 [<ffffffffa023cdd0>] ? arc_adapt_thread+0x0/0x4e0 [zfs]
 [<ffffffffa023cdd0>] ? arc_adapt_thread+0x0/0x4e0 [zfs]
 [<ffffffffa01e0778>] ? thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa01e0710>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
 [<ffffffff8100c20a>] ? child_rip+0xa/0x20
 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
---[ end trace 2f3ef61f35fa4386 ]---

Below message from the console from the 16gb server locked up solid.
==================================================================
[string ] ? panic+0ca7/8x16f
[string ] ? oops_end+0xe4/0x100
[string ] ? no_context+0xfb/ox260
[string ] ? __bad_area_nosemaphore+0x125/0x1e0
[string ] ?  bad_area_nosemaphore+0x13/0x20
[string ] ? __do_page_fault+0x31f/0x480
[string ] ? arc_hdr_destroy+0x252/0x310 [zfs]
[string ] ? arc_evictzzz-ghost+0x2f9/0x6a0 [zfs]
[string ] ? do_page_fault+0x25/0x30
[string ] ? page_fault+0x25/0x30
[string ] ? shrink_dcache_parent+0x1a/0x170
[string ] ? mutex_lock+0x1e/0x50
[string ] ? zfs_sb_prune+0x90/oxb0 [zfs]
[string ] ? zfs_prune_sb+0x0/0x30 [zfs]
[string ] ? zfs_prune_sb+0x26/0x30 [zfs]
[string ] ? arc_adapt_thread+0x254/0x4e0 [zfs]
[string ] ? arc_adapt_thread+0x0/0x4e0 [zfs]
[string ] ? arc_adapt_thread+0x0/0x4e0 [zfs]
[string ] ? thread_generic_wrapper+0x68/0x80 [spl]
[string ] ? thread_generic_wrapper+0x0/0x80 [spl]
[string ] ? kthread+0x9e/0xc0
[string ] ? child_rip+0xa/0x20
[string ] ? kthread+0x0/0xc0
[string ] ? child_rip+00/0x20

Below messages in the /var/log/messages file prior to ONE OF the lock-us from 16gb server
==================================================================
Apr 19 21:45:55 hume auditd[2000]: Audit daemon rotating log files
Apr 19 23:19:40 hume kernel: INFO: task spl_system_task:656 blocked for more than 120 seconds.
Apr 19 23:19:40 hume kernel:      Tainted: P           ---------------    2.6.32-504.12.2.el6.x86_64 #1
Apr 19 23:19:40 hume kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 23:19:40 hume kernel: spl_system_ta D 0000000000000001     0   656      2 0x00000000
Apr 19 23:19:40 hume kernel: ffff88081e1174f0 0000000000000046 0000000000000000 ffff8807e96bc190
Apr 19 23:19:40 hume kernel: ffffffffa02f7b00 0000000000000000 ffff88081e117480 ffffffffa02dfa62
Apr 19 23:19:40 hume kernel: 0000000000020000 ffff8807e96bc190 ffff88081f1ba5f8 ffff88081e117fd8
Apr 19 23:19:40 hume kernel: Call Trace:
Apr 19 23:19:40 hume kernel: [<ffffffffa02dfa62>] ? zio_vdev_io_done+0x42/0x190 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffff8109ed4e>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr 19 23:19:40 hume kernel: [<ffffffffa01f1f4d>] cv_wait_common+0x11d/0x130 [spl]
Apr 19 23:19:40 hume kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Apr 19 23:19:40 hume kernel: [<ffffffffa01f1fb5>] __cv_wait+0x15/0x20 [spl]
Apr 19 23:19:40 hume kernel: [<ffffffffa0254247>] traverse_prefetcher+0xe7/0x1b0 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa0254866>] traverse_visitbp+0x4a6/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa0255211>] traverse_dnode+0x71/0xd0 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa0254a37>] traverse_visitbp+0x677/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02547db>] traverse_visitbp+0x41b/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa0255211>] traverse_dnode+0x71/0xd0 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa0254ae0>] traverse_visitbp+0x720/0x800 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa02552fb>] traverse_prefetch_thread+0x8b/0x100 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa0254160>] ? traverse_prefetcher+0x0/0x1b0 [zfs]
Apr 19 23:19:40 hume kernel: [<ffffffffa01edee7>] taskq_thread+0x1e7/0x3f0 [spl]
Apr 19 23:19:40 hume kernel: [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
Apr 19 23:19:40 hume kernel: [<ffffffffa01edd00>] ? taskq_thread+0x0/0x3f0 [spl]
Apr 19 23:19:40 hume kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0
Apr 19 23:19:40 hume kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 19 23:19:40 hume kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Apr 19 23:19:40 hume kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20

There were no entries in /var/log/messages for the 32gb server

What we've tried in 16gb server:

1. Changed RAM memory.
2. Changed 2 hard drives that had some errors.
3. Changed power supply.

Panics still come and it appears that it is during high loading using rsync.
NOTE: We run a zpool scrub weekly on Saturdays.

The crashes have occurred on Saturday/Sunday, but we've also had several midweek too, and aklso
a few on Monday-Tuesday time-frame..

Suggestions?  Possible causes?

CPUs:           1 - Dual core Intel Pentium G2020
Memory:         16GB
VM/Hypervisor:      No
ECC mem:        Yes
Distribution:       Cento0s6.6/Linux
Kernel version:     Linux 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
SPL/ZFS source:     RPM packages yum install for Centos/Redhat - from http://zfsonlinux.org/epel.html
SPL/ZFS version:    hume# dmesg | grep -E 'SPL:|ZFS:'
            SPL: Loaded module v0.6.4-1
            ZFS: Loaded module v0.6.4-1, ZFS pool version 5000, ZFS filesystem version 5
            SPL: using hostid 0x00000000

System services:    Backup file server - postgres targz files, rsyncs of selected host sserver directories, small postgres db index drive contents.
Short  description: System locks up with panic messages and partial trace on console, )none in logs) on high IO use which may be coincidental to zfs scubs on Saturdays.  
            Nightly tar.gz file snaps of remote postgres databases are sent to this machine using rsync.  

[root@hume jp]# zpool status
  pool: zfspool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 80h2m with 0 errors on Tue May  5 09:03:15 2015
config:

        NAME                                          STATE     READ WRITE CKSUM
        zfspool                                       ONLINE       0     0     0
          raidz2-0                                    ONLINE       0     0     0
            ata-WDC_WD30EZRX-00D8PB0_WD-WMC4N0J38079  ONLINE       0     0     0
            ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0751612  ONLINE       0     0     0
            ata-WDC_WD30EZRX-00D8PB0_WD-WMC4N0J1KXPF  ONLINE       0     0     0
            ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2558659  ONLINE       0     0     0
            ata-HGST_HDN724040ALE640_PK2334PBH1RK7R   ONLINE       0     0     0
            ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T4343287  ONLINE       0     0     0
            ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0958588  ONLINE       0     0     0
            ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2602397  ONLINE       0     0     0

errors: No known data errors
[root@hume jp]#

System CPUs   = 2
Boot Command = ro root=UUID=d05e3113-4067-4c19-ab25-c618ab6ad57e nomodeset rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 crashkernel=130M@0M rd_MD_UUID=08b7eb4d:1a02cefe:6c51f952:ee49bec0  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
Kernel Version = Linux version 2.6.32-504.12.2.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Wed Mar 11 22:03:14 UTC 2015
Version Release:
CentOS release 6.6 (Final)

inxi -F

System:    Host: hume.schoolpathways.com Kernel: 2.6.32-504.12.2.el6.x86_64 x86_64 (64 bit gcc: 4.4.7)
           Console: tty 3 Distro: CentOS release 6.6 (Final)
Machine:   System: Supermicro product: X9SCL/X9SCM v: 0123456789 serial: 0123456789
           Mobo: Supermicro model: X9SCL/X9SCM v: 1.11A serial: ZM144S048094
           Bios: American Megatrends v: 2.10 date: 01/09/2014
           Chassis: type: 3 v: 0123456789 serial: 0123456789
CPU:       Dual core Intel Pentium G2020 (-MCP-) cache: 3072 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 11599
           clock speeds: max: 2899 MHz 1: 2899 MHz 2: 2899 MHz
Memory:    Array-1 capacity: 32 GB devices: 4 EC: Single-bit ECC
           Device-1: DIMM_1A size: No Module Installed type: N/A
           Device-2: DIMM_2A size: 8 GB speed: N/A type: DDR3 (Synchronous)
           bus width: 128 bits manufacturer: Kingston part: 9965525-116.A00LF serial: CA22E81B
           Device-3: DIMM_1B size: No Module Installed type: N/A
           Device-4: DIMM_2B size: 8 GB speed: N/A type: DDR3 (Synchronous)
           bus width: 128 bits manufacturer: Kingston part: 9965525-116.A00LF serial: CB0C9554
           Device-5: N/A size: 0 MB speed: N/A type: Flash
           bus width: 8 bits manufacturer: Winbond part: 25X/Q Series serial: N/A
Graphics:  Card: Matrox Systems MGA G200eW WPCM450 bus-ID: 06:03.0 chip-ID: 102b:0532
           Display Server: N/A driver: N/A tty size: 132x64 Advanced Data: N/A for root out of X
Network:   Card-1: Intel 82579LM Gigabit Network Connection
           driver: e1000e v: 2.3.2-k port: f020 bus-ID: 00:19.0 chip-ID: 8086:1502
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 0c:c4:7a:07:c1:1d
           Card-2: Intel 82574L Gigabit Network Connection
           driver: e1000e v: 2.3.2-k port: e000 bus-ID: 05:00.0 chip-ID: 8086:10d3
           IF: eth1 state: down mac: 0c:c4:7a:07:c1:1c
           WAN IP: 173.166.236.101 IF: eth0 ip: 192.168.3.9 ip-v6: fe80::ec4:7aff:fe07:c11d
           IF: eth1 ip: N/A ip-v6: N/A
Drives:    HDD Total Size: 40172.5GB (4.7% used)
           ID-1: /dev/sda model: WDC_WD1600YS size: 164.7GB serial: WD-WCAP03446213 temp: 36C
           ID-2: /dev/sdb model: WDC_WD30EZRX size: 3000.6GB serial: WD-WMC4N0J1KXPF temp: 40C
           ID-3: /dev/sdc model: HGST_HDN724040AL size: 4000.8GB serial: PK2334PBH1RK7R temp: 45C
           ID-4: /dev/sdd model: WDC_WD30EZRX size: 3000.6GB serial: WD-WMC4N0J38079 temp: 39C
           ID-5: /dev/sde model: WDC_WD30EZRX size: 3000.6GB serial: WD-WCAWZ2602397 temp: 41C
           ID-6: /dev/sdf model: WDC_WD30EZRX size: 3000.6GB serial: WD-WMC1T0751612 temp: 39C
           ID-7: /dev/sdg model: WDC_WD30EZRX size: 3000.6GB serial: WD-WMC1T0958588 temp: 39C
           ID-8: /dev/sdh model: WDC_WD30EZRX size: 3000.6GB serial: WD-WCAWZ1592095 temp: 40C
           ID-9: /dev/sdi model: WDC_WD30EZRX size: 3000.6GB serial: WD-WCAWZ0879041 temp: 40C
           ID-10: /dev/sdj model: WDC_WD30EZRX size: 3000.6GB serial: WD-WCAWZ2558659 temp: 40C
           ID-11: /dev/sdk model: WDC_WD30EZRX size: 3000.6GB serial: WD-WMC1T4343287 temp: 40C
           ID-12: /dev/sdl model: HGST_HTS725050A7 size: 500.1GB serial: TF0501WE02TV5Z temp: 34C
           ID-13: /dev/sdm model: WDC_WD5000LPVX size: 500.1GB serial: WD-WX81AA4LPCS3 temp: 35C
           ID-14: USB /dev/sdo model: My_Book_1230 size: 4000.8GB serial: 574343344534373753395352-0:0 temp: 0C
           ID-15: USB /dev/sdp model: My_Book_1230 size: 4000.8GB serial: 5743433445444A534D463839-0:0 temp: 0C
           Optical: No optical drives detected.
Partition: ID-1: / size: 22G used: 16G (77%) fs: ext4 dev: /dev/md1
           ID-2: /tmp size: 110G used: 374M (1%) fs: ext4 dev: /dev/md127
           ID-3: /var size: 328G used: 12G (4%) fs: ext4 dev: /dev/md126
           ID-4: /boot size: 477M used: 165M (37%) fs: ext4 dev: /dev/md0
           ID-5: swap-1 size: 164.69GB used: 0.00GB (0%) fs: swap dev: /dev/sda1
RAID:      Device-1: /dev/md121 - inactive components: online: none spare: sdn5
           Device-2: /dev/md122 - inactive components: online: none spare: sdn3
           Device-3: /dev/md123 - inactive components: online: none spare: sdn2
           Device-4: /dev/md124 - inactive components: online: none spare: sdn1
           Device-5: /dev/md125 - inactive components: online: none spare: sdn6
           Device-6: /dev/md126 - active raid: 1 components: online: 2/2 - sdm4 sdl4
           Device-7: /dev/md127 - active raid: 1 components: online: 2/2 - sdm3 sdl3
           Device-8: /dev/md0 - active raid: 1 components: online: 2/2 - sdm1 sdl1
           Device-9: /dev/md1 - active raid: 1 components: online: 2/2 - sdl2 sdm2
Sensors:   System Temperatures: cpu: 29.8C mobo: 27.8C
           Fan Speeds (in rpm): cpu: N/A
Info:      Processes: 377 Uptime: 2 days Memory: 2746.2/15922.5MB Init: Upstart runlevel: 3
           Client: Shell (bash) inxi: 2.2.16 
alexanderhaensch commented 9 years ago

Very similar issue here. Server was good before switching to the 0.6.4.1 tag. The system journal collected data in the last 10 minutes before the system went dark. I can share the whole file ~1MB, if it is interesting. Here are the essentials: I found 11 backtraces all pointing to the program arc_adapt, two of them do not have the line showing the PID and Comm field!

First stall happens at 07:15:08 then it happens every 1 minute with some deviations at 07:21:43

Here is the timeline:

May 06 07:15:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=19038966 c=19038965 q=38498) May 06 07:16:11 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=8403 jiffies g=19038966 c=19038965 q=424842) May 06 07:17:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19038973 c=19038972 q=279325) May 06 07:18:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=19038977 c=19038976 q=379424) May 06 07:19:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=19039003 c=19039002 q=325361) May 06 07:19:39 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039004 c=19039003 q=514848) May 06 07:20:10 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039006 c=19039005 q=420034) May 06 07:21:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=2101 jiffies g=19039009 c=19039008 q=298247) May 06 07:21:43 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039010 c=19039009 q=467105) May 06 07:22:36 io kernel: INFO: rcu_sched self-detected stall on CPU { 14} (t=2101 jiffies g=19039013 c=19039012 q=248672) May 06 07:23:09 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039014 c=19039013 q=436309) May 06 07:24:09 io kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=2100 jiffies g=19039022 c=19039021 q=272070) May 06 07:25:03 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2100 jiffies g=19039023 c=19039022 q=688786)

And here is the first backtrace on CPU2:

kernel: NMI backtrace for cpu 2 kernel: CPU: 2 PID: 2697 Comm: arc_adapt Tainted: P O 3.14.39-hardened #1 kernel: Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 kernel: task: ffff88101eb4e6c0 ti: ffff88101eb4ec20 task.ti: ffff88101eb4ec20 kernel: RIP: 0010:[] [] delay_tsc+0x36/0x60 kernel: RSP: 0000:ffff88103fc43d78 EFLAGS: 00000046 kernel: RAX: 00000000c630e47a RBX: 00000000000003e9 RCX: 00000000c630e42f kernel: RDX: 000000000005936e RSI: 0000000000000002 RDI: 000000000003344f kernel: RBP: ffff88103fc43d78 R08: ffffffff8187c158 R09: ffffffff819a94f4 kernel: R10: 0000000000002880 R11: 000000000000287f R12: 0000000000001000 kernel: R13: 0000000000000400 R14: 0000000000000082 R15: 0000000000000002 kernel: FS: 0000000000000000(0000) GS:ffff88103fc40000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffffffff600400 CR3: 0000000fad31d000 CR4: 00000000001607f0 kernel: Stack: kernel: ffff88103fc43d88 ffffffff8134f52a ffff88103fc43d98 ffffffff8134f558 kernel: ffff88103fc43db8 ffffffff8102de92 0000000000009032 ffffffff8187c158 kernel: ffff88103fc43e00 ffffffff8102f344 0000000600000008 0000000000000003 kernel: Call Trace: kernel: kernel: kernel: [] delay+0xa/0x10 kernel: [] const_udelay+0x28/0x30 kernel: [] native_safe_apic_wait_icr_idle+0x22/0x50 kernel: [] default_send_IPI_mask_sequence_phys+0xc4/0xd0 kernel: [] physflat_send_IPI_all+0x12/0x20 kernel: [] arch_trigger_all_cpu_backtrace+0x46/0x80 kernel: [] rcu_check_callbacks+0x36e/0x5d0 kernel: [] ? acct_account_cputime+0x17/0x20 kernel: [] ? account_system_time+0x89/0x180 kernel: [] update_process_times+0x42/0x70 kernel: [] tick_sched_handle.isra.17+0x31/0x40 kernel: [] tick_sched_timer+0x39/0x60 kernel: [] run_hrtimer.isra.32+0x3d/0xc0 kernel: [] hrtimer_interrupt+0xe5/0x220 kernel: [] local_apic_timer_interrupt+0x32/0x60 kernel: [] smp_apic_timer_interrupt+0x3a/0x50 kernel: [] apic_timer_interrupt+0x6e/0x80 kernel: kernel: kernel: [] ? dmu_zfetch+0x76/0x1b90 [zfs] kernel: [] dbuf_read+0x861/0x1250 [zfs] kernel: [] ? cleanup_module+0x16ad/0x35235 [zfs] kernel: [] dnode_hold_impl+0x1b9/0x5a0 [zfs] kernel: [] dnode_hold+0x14/0x20 [zfs] kernel: [] dmu_bonus_hold+0x24/0x2d0 [zfs] kernel: [] sa_buf_hold+0x9/0x10 [zfs] kernel: [] zfs_zget+0x166/0x2c0 [zfs] kernel: [] zfs_link_destroy+0x604/0x760 [zfs] kernel: [] ? __list_add+0x17/0x40 kernel: [] ? unfreeze_partials.isra.51+0x14a/0x1b0 kernel: [] ? lock_timer_base.isra.27+0x26/0x50 kernel: [] ? try_to_del_timer_sync+0x43/0x60 kernel: [] ? schedule_timeout+0x12f/0x1d0 kernel: [] ? kmalloc_node+0x29/0x1a0 kernel: [] ? spl_kmem_alloc+0xdf/0x1a0 [spl] kernel: [] zfs_rmnode+0x252/0x330 [zfs] kernel: [] zfs_zinactive+0x148/0x160 [zfs] kernel: [] zfs_inactive+0x5f/0x210 [zfs] kernel: [] zpl_vap_init+0x8ae/0xa90 [zfs] kernel: [] ? cleanup_module+0x7d2d/0x35235 [zfs] kernel: [] evict+0x99/0x190 kernel: [] dispose_list+0x36/0x50 kernel: [] prune_icache_sb+0x42/0x50 kernel: [] super_cache_scan+0xfd/0x170 kernel: [] zfs_sb_prune+0x9a/0xc0 [zfs] kernel: [] ? zpl_vap_init+0xa90/0xa90 [zfs] kernel: [] zpl_prune_sb+0x21/0xdd0 [zfs] kernel: [] arc_space_return+0x1d4e/0x3d30 [zfs] kernel: [] ? arc_space_return+0x1930/0x3d30 [zfs] kernel: [] __thread_exit+0x8c/0xa0 [spl] kernel: [] ? __thread_exit+0x20/0xa0 [spl] kernel: [] kthread+0xc4/0xe0 kernel: [] ? kthread_create_on_node+0x160/0x160 kernel: [] ret_from_fork+0x49/0x80 kernel: [] ? kthread_create_on_node+0x160/0x160 kernel: Code: 00 0f 1f 00 0f ae e8 0f 31 89 c1 eb 16 0f 1f 84 00 00 00 00 00 f3 90 65 8b 14 25 2c 90 00 00 39 d6 75 1b 0f 1f 00 0f ae e8 0f 31 <48> c1 e2 20 89 c0 48 09 c2 89 d0 29 ca 39 fa 72 d9 5d c3 29 c1

solarjdp69 commented 9 years ago

Alexander,

Thanks for this information. Good to know we are not alone in this issue. I have changed a setting on our zfs pool but not enough time has elapsed to see if it does any good.

zpool set failmode=continue zfspool The default is wait. Thinking is this might allow the slow IO to catch up and complete the request before the requests pile too high up and cause the kernel panic. This is just a feeling on my part. Time will tell.

To view your settings - # zpool list -o failmode FAILMODE continue [root@hume jp]#

On Wed, May 6, 2015 at 1:39 AM, alexanderhaensch notifications@github.com wrote:

Very similar issue here. Server was good before switching to the 0.6.4.1 tag. The system journal collected data in the last 10 minutes before the system went dark. I can share the whole file ~1MB, if it is interesting. Here are the essentials: I found 11 backtraces all pointing to the program arc_adapt, two of them do not have the line showing the PID and Comm field!

First stall happens at 07:15:08 then it happens every 1 minute with some deviations at 07:21:43

Here is the timeline:

May 06 07:15:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=19038966 c=19038965 q=38498) May 06 07:16:11 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=8403 jiffies g=19038966 c=19038965 q=424842) May 06 07:17:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19038973 c=19038972 q=279325) May 06 07:18:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=19038977 c=19038976 q=379424) May 06 07:19:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=19039003 c=19039002 q=325361) May 06 07:19:39 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039004 c=19039003 q=514848) May 06 07:20:10 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039006 c=19039005 q=420034) May 06 07:21:08 io kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=2101 jiffies g=19039009 c=19039008 q=298247) May 06 07:21:43 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039010 c=19039009 q=467105) May 06 07:22:36 io kernel: INFO: rcu_sched self-detected stall on CPU { 14} (t=2101 jiffies g=19039013 c=19039012 q=248672) May 06 07:23:09 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2101 jiffies g=19039014 c=19039013 q=436309) May 06 07:24:09 io kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=2100 jiffies g=19039022 c=19039021 q=272070) May 06 07:25:03 io kernel: INFO: rcu_sched self-detected stall on CPU { 4} (t=2100 jiffies g=19039023 c=19039022 q=688786)

And here is the first backtrace on CPU2:

kernel: NMI backtrace for cpu 2 kernel: CPU: 2 PID: 2697 Comm: arc_adapt Tainted: P O 3.14.39-hardened #1 https://github.com/zfsonlinux/zfs/issues/1 kernel: Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 kernel: task: ffff88101eb4e6c0 ti: ffff88101eb4ec20 task.ti: ffff88101eb4ec20 kernel: RIP: 0010:[] [] delay_tsc+0x36/0x60 kernel: RSP: 0000:ffff88103fc43d78 EFLAGS: 00000046 kernel: RAX: 00000000c630e47a RBX: 00000000000003e9 RCX: 00000000c630e42f kernel: RDX: 000000000005936e RSI: 0000000000000002 RDI: 000000000003344f kernel: RBP: ffff88103fc43d78 R08: ffffffff8187c158 R09: ffffffff819a94f4 kernel: R10: 0000000000002880 R11: 000000000000287f R12: 0000000000001000 kernel: R13: 0000000000000400 R14: 0000000000000082 R15: 0000000000000002 kernel: FS: 0000000000000000(0000) GS:ffff88103fc40000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffffffff600400 CR3: 0000000fad31d000 CR4: 00000000001607f0 kernel: Stack: kernel: ffff88103fc43d88 ffffffff8134f52a ffff88103fc43d98 ffffffff8134f558 kernel: ffff88103fc43db8 ffffffff8102de92 0000000000009032 ffffffff8187c158 kernel: ffff88103fc43e00 ffffffff8102f344 0000000600000008 0000000000000003 kernel: Call Trace: kernel: kernel: kernel: [] delay+0xa/0x10 kernel: [] const_udelay+0x28/0x30 kernel: [] native_safe_apic_wait_icr_idle+0x22/0x50 kernel: [] default_send_IPI_mask_sequence_phys+0xc4/0xd0 kernel: [] physflat_send_IPI_all+0x12/0x20 kernel: [] arch_trigger_all_cpu_backtrace+0x46/0x80 kernel: [] rcu_check_callbacks+0x36e/0x5d0 kernel: [] ? acct_account_cputime+0x17/0x20 kernel: [] ? account_system_time+0x89/0x180 kernel: [] update_process_times+0x42/0x70 kernel: [] tick_sched_handle.isra.17+0x31/0x40 kernel: [] tick_sched_timer+0x39/0x60 kernel: [] run_hrtimer.isra.32+0x3d/0xc0 kernel: [] hrtimer_interrupt+0xe5/0x220 kernel: [] local_apic_timer_interrupt+0x32/0x60 kernel: [] smp_apic_timer_interrupt+0x3a/0x50 kernel: [] apic_timer_interrupt+0x6e/0x80 kernel: kernel: kernel: [] ? dmu_zfetch+0x76/0x1b90 [zfs] kernel: [] dbuf_read+0x861/0x1250 [zfs] kernel: [] ? cleanup_module+0x16ad/0x35235 [zfs] kernel: [] dnode_hold_impl+0x1b9/0x5a0 [zfs] kernel: [] dnode_hold+0x14/0x20 [zfs] kernel: [] dmu_bonus_hold+0x24/0x2d0 [zfs] kernel: [] sa_buf_hold+0x9/0x10 [zfs] kernel: [] zfs_zget+0x166/0x2c0 [zfs] kernel: [] zfs_link_destroy+0x604/0x760 [zfs] kernel: [] ? __list_add+0x17/0x40 kernel: [] ? unfreeze_partials.isra.51+0x14a/0x1b0 kernel: [] ? lock_timer_base.isra.27+0x26/0x50 kernel: [] ? try_to_del_timer_sync+0x43/0x60 kernel: [] ? schedule_timeout+0x12f/0x1d0 kernel: [] ? kmalloc_node+0x29/0x1a0 kernel: [] ? spl_kmem_alloc+0xdf/0x1a0 [spl] kernel: [] zfs_rmnode+0x252/0x330 [zfs] kernel: [] zfs_zinactive+0x148/0x160 [zfs] kernel: [] zfs_inactive+0x5f/0x210 [zfs] kernel: [] zpl_vap_init+0x8ae/0xa90 [zfs] kernel: [] ? cleanup_module+0x7d2d/0x35235 [zfs] kernel: [] evict+0x99/0x190 kernel: [] dispose_list+0x36/0x50 kernel: [] prune_icache_sb+0x42/0x50 kernel: [] super_cache_scan+0xfd/0x170 kernel: [] zfs_sb_prune+0x9a/0xc0 [zfs] kernel: [] ? zpl_vap_init+0xa90/0xa90 [zfs] kernel: [] zpl_prune_sb+0x21/0xdd0 [zfs] kernel: [] arc_space_return+0x1d4e/0x3d30 [zfs] kernel: [] ? arc_space_return+0x1930/0x3d30 [zfs] kernel: [] __thread_exit+0x8c/0xa0 [spl] kernel: [] ? __thread_exit+0x20/0xa0 [spl] kernel: [] kthread+0xc4/0xe0 kernel: [] ? kthread_create_on_node+0x160/0x160 kernel: [] ret_from_fork+0x49/0x80 kernel: [] ? kthread_create_on_node+0x160/0x160 kernel: Code: 00 0f 1f 00 0f ae e8 0f 31 89 c1 eb 16 0f 1f 84 00 00 00 00 00 f3 90 65 8b 14 25 2c 90 00 00 39 d6 75 1b 0f 1f 00 0f ae e8 0f 31 <48> c1 e2 20 89 c0 48 09 c2 89 d0 29 ca 39 fa 72 d9 5d c3 29 c1

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3376#issuecomment-99383953.

John Palys School Pathways, Inc. 866-200-6936 #2010 * Office*

alexanderhaensch commented 9 years ago

@solarjdp69 Thanks for the tip. Do you think this will solve the problem? What is that setting doing exactly?

I was adding

sysctl kernel.panic = 3

to ensure the server reboots on a panic.

All in all i am satisfied with the release because it is much faster than 0.6.3 so i do not want to go back, so i hope the problem goes away soon.

Some research in the logfiles showed that there was a rsync job running during the crash. The rsync was configured to run every 30 minutes. It need around 15 minutes to complete. So i think we see a very similar problem here. The rsync was running in the last 10 minutes 'uninterrupted' meaning there had been alot of successful reads during the problematic period.

behlendorf commented 9 years ago

@solarjdp69 @alexanderhaensch if your comfortable applying a small patch to the code. You can prevent the issue by applying the following small patch which disables the offending (non-critical) code. This is at least until we get to the bottom of what's going on here and address it in a point release.

diff --git a/module/zfs/zfs_vfsops.c b/module/zfs/zfs_vfsops.c
index 053f3f2..a9c33b2 100644
--- a/module/zfs/zfs_vfsops.c
+++ b/module/zfs/zfs_vfsops.c
@@ -1315,7 +1315,7 @@ zfs_domount(struct super_block *sb, void *data, int silent
        if (!zsb->z_issnap)
                zfsctl_create(zsb);

-       zsb->z_arc_prune = arc_add_prune_callback(zpl_prune_sb, sb);
+//     zsb->z_arc_prune = arc_add_prune_callback(zpl_prune_sb, sb);
 out:
        if (error) {
                dmu_objset_disown(zsb->z_os, zsb);
@@ -1354,7 +1354,7 @@ zfs_umount(struct super_block *sb)
        zfs_sb_t *zsb = sb->s_fs_info;
        objset_t *os;

-       arc_remove_prune_callback(zsb->z_arc_prune);
+//     arc_remove_prune_callback(zsb->z_arc_prune);
        VERIFY(zfs_sb_teardown(zsb, B_TRUE) == 0);
        os = zsb->z_os;
        bdi_destroy(sb->s_bdi);
alexanderhaensch commented 9 years ago

I guess this should be applied on top of the 0.6.4.1 branch? Or is it also applying to current master?

I was uploading all kernel logs including excerpts for the different stalls (a - m) https://github.com/alexanderhaensch/zfs-logfiles/blob/master

behlendorf commented 9 years ago

It will apply cleanly to both at the moment.

alexanderhaensch commented 9 years ago

@behlendorf thank you for that speedy workaround.

olw2005 commented 9 years ago

We also experienced "hung task" traces (identical to those posted by @solarjdp69) on multiple machines soon after upgrading to 0.6.4 / 0.6.4.1. System configs: vanilla Centos 6.6, mid-size (12-disk) to large (48-disk) pools, 192GB ram, various amounts of L2ARC. The timing of the event log messages coincided with snapshot send / recv's. Notable difference, they did not cause system lockups nor was there any visible performance impact.

I made three config changes shortly thereafter and haven't seen any log file spewage since:

  1. Doubled the hung task timeout: echo 240 > /proc/sys/kernel/hung_task_timeout_secs
  2. Turned off prefetch after reviewing the new arc_summary.py output: zfs_prefetch_disable=1 in /etc/modprobe.d/zfs.conf
  3. Enabled the new pool features with zpool upgrade. (Side note, that upgrade generated a scary-high system load and io wait for about 15-20 minutes on our three larger ~60TB systems. Might be worth a comment or something in the release notes, etc. to perform that operation "offline" where possible?)

This afternoon I set the timeout back to the default 120s to see if by chance that was masking the timeouts. Will post add'l info if any new messages appear.

As ever, great work gents! =)

solarjdp69 commented 9 years ago

@behlendorf - We used thr redhat/centos rpms from zfsonlinux. Howso we go about installing the patch? Do we download and apply the patch using the instructions for dkms here ? http://zfsonlinux.org/generic-rpm.html Or is their some easier method that exists with what's installed? @alexanderhaensch - The failmode set to continue versus the default of wait - I am hoping that the system will catch up or recover from it's timeout state and continue without a kernel panic.

alexanderhaensch commented 9 years ago

Yesterday the server crashed again. Force now applies the patch. :-/ Interestingly, this time I had kernel.panic = 3 active, and it did not reset the server. Moreover, the syslogs are showing only 3 minutes in that the stalls are happening. It seems that multiple arc_adapt threads were running during this relative quiet period. The timing of this time stop of the system is correlating again with a rsync task that was running over a long period. Does someone have an idea to force the system to reboot if something like this happens again?

alexanderhaensch commented 9 years ago

Some more info about the system:

Machine: Mobo: Supermicro model: X9DR3-F v: 0123456789 serial:
CPU(s): 2 Hexa core Intel Xeon E5-2620 v2s (-HT-MCP-SMP-) cache: 30720 KB RAM: 128GB HBA: LSI-92078e / 8i ZIL/L2ARC: INTEL_SSDSC2BB24 ZPOOL: HGST_HUS724020AL

There is a sister system that has nearly the same setup with TOSHIBA SAS Drives and no ZIL/L2ARC. That system is running different tasks and is not crashing. This gives me the idea that Rsync/samba or the LSI controller cards are triggering the crash. The LSI controllers are known for having many bugs in their firmware especially if it comes to SATA drives. @solarjdp69 does not use LSI hardware so it is more likely that Rsync is triggering the bug.

solarjdp69 commented 9 years ago

Alexander,

Did you get the patch installed and compiled? I have not found a version of that file zfs_vfsops.c that the patch runs OK. I Get 2 hunks fail no matter what I try it on.

The failmode-continue let our systems run a LOT longer. Instead of 2-3 days, we went a whole 8 days before the system paniced. However, no bad sector, smart, any disk messages during that time and instead of repairing and/or re-silvering, the file system just came up nicely with all zpools status OK.

On Sun, May 10, 2015 at 6:26 AM, alexanderhaensch notifications@github.com wrote:

Maybe some info about the system: inxi -F:

System: Host: io Kernel: 3.14.39-hardened x86_64 (64 bit) Console: tty 21 Distro: Gentoo Base System release 2.2

Machine: Mobo: Supermicro model: X9DR3-F v: 0123456789 serial: VM13CS027414

Bios: American Megatrends v: 3.0a date: 07/31/2013

CPU(s): 2 Hexa core Intel Xeon E5-2620 v2s (-HT-MCP-SMP-) cache: 30720 KB clock speeds: max: 2101 MHz 1: 2101 MHz 2: 2101 MHz 3: 2101 MHz 4: 2101 MHz 5: 2101 MHz 6: 2101 MHz 7: 2101 MHz 8: 2101 MHz 9: 2101 MHz 10: 2101 MHz 11: 2101 MHz 12: 2101 MHz 13: 2101 MHz 14: 2101 MHz 15: 2101 MHz 16: 2101 MHz 17: 2101 MHz 18: 2101 MHz 19: 2101 MHz 20: 2101 MHz 21: 2101 MHz 22: 2101 MHz 23: 2101 MHz 24: 2101 MHz Graphics: Card: Matrox Systems MGA G200eW WPCM450 Display Server: N/A driver: N/A tty size: 241x50 Advanced Data: N/A for root out of X Network: Card-1: Intel I350 Gigabit Network Connection driver: igb IF: enp2s0f0 state: up speed: 1000 Mbps duplex: full mac: 00:25:90:c1:46:79 Card-2: Intel I350 Gigabit Network Connection driver: igb IF: enp2s0f1 state: up speed: 1000 Mbps duplex: full mac: 00:25:90:c1:46:79 Card-3: Intel 82575EB Gigabit Network Connection driver: igb IF: enp5s0f0 state: up speed: 1000 Mbps duplex: full mac: 00:25:90:c1:46:79 Card-4: Intel 82575EB Gigabit Network Connection driver: igb IF: enp5s0f1 state: up speed: 1000 Mbps duplex: full mac: 00:25:90:c1:46:79 Card-5: Intel Ethernet Controller 10-Gigabit X540-AT2 driver: ixgbe IF: enp130s0f0 state: up speed: 10000 Mbps duplex: full mac: 32:ae:0c:0b:79:06 Card-6: Intel Ethernet Controller 10-Gigabit X540-AT2 driver: ixgbe IF: enp130s0f1 state: down mac: 32:ae:0c:0b:79:06 Drives: HDD Total Size: 18023.7GB (0.2% used) ID-1: /dev/sdc model: INTEL_SSDSC2BB24 size: 240.1GB ID-2: /dev/sdf model: INTEL_SSDSC2BB24 size: 240.1GB ID-3: /dev/sde model: INTEL_SSDSC2BB24 size: 240.1GB ID-4: /dev/sdh model: INTEL_SSDSC2BB24 size: 30.0GB ID-5: /dev/sdb model: INTEL_SSDSC2BB24 size: 240.1GB ID-6: /dev/sdg model: INTEL_SSDSC2BB24 size: 30.0GB ID-7: /dev/sda model: WDC_WD5003ABYX size: 500.1GB ID-8: /dev/sdd model: WDC_WD5003ABYX size: 500.1GB ID-9: /dev/sdv model: HGST_HUS724020AL size: 2000.4GB ID-10: /dev/sds model: HGST_HUS724020AL size: 2000.4GB ID-11: /dev/sdo model: HGST_HUS724020AL size: 2000.4GB ID-12: /dev/sdp model: HGST_HUS724020AL size: 2000.4GB ID-13: /dev/sdx model: HGST_HUS724020AL size: 2000.4GB ID-14: /dev/sdr model: HGST_HUS724020AL size: 2000.4GB ID-15: /dev/sdl model: HGST_HUS724020AL size: 2000.4GB ID-16: /dev/sdi model: HGST_HUS724020AL size: 2000.4GB ID-17: /dev/sdq model: HGST_HUS724020AL size: 2000.4GB ID-18: /dev/sdt model: HGST_HUS724020AL size: 2000.4GB ID-19: /dev/sdm model: HGST_HUS724020AL size: 2000.4GB ID-20: /dev/sdk model: HGST_HUS724020AL size: 2000.4GB ID-21: /dev/sdn model: HGST_HUS724020AL size: 2000.4GB ID-22: /dev/sdj model: HGST_HUS724020AL size: 2000.4GB ID-23: /dev/sdu model: HGST_HUS724020AL size: 2000.4GB ID-24: /dev/sdw model: HGST_HUS724020AL size: 2000.4GB Partition: ID-1: / size: 454G used: 41G (10%) fs: ext4 dev: /dev/md1 ID-2: /boot size: 459M used: 171M (40%) fs: ext2 dev: /dev/md0 RAID: Device-1: /dev/md0 - active raid: 1 components: online: 2/2 - sda2 sdd2 Device-2: /dev/md1 - active raid: 1 components: online: 2/2 - sda3 sdd3 Sensors: System Temperatures: cpu: 34.0C mobo: N/A Fan Speeds (in rpm): cpu: N/A Info: Processes: 671 Uptime: 4:55 Memory: 23302.7/128913.2MB Init: systemd Client: Shell (bash) inxi: 2.2.19

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3376#issuecomment-100641819.

John Palys School Pathways, Inc. 866-200-6936 #2010 * Office*

alexanderhaensch commented 9 years ago

@solarjdp69 : I had a problem with the patch too. The problem is that the source has tabulator spacing and here everything is white space indentated. You could try to use the patch -i (--ignore-whitespace). In my case (gentoo), i was manually commenting the two lines and created a new patch. Then i copied it to /etc/portage/patches/sys-fs/zfs/ and /etc/portage/patches/sys-fs/zfs-kmod/ . The gentoo package manager applies the patch automatically now.

I was uploading it here: https://github.com/alexanderhaensch/zfs-logfiles/blob/master/disable_kernel_panic2.patch

Hope this helps.

olw2005 commented 9 years ago

Looking through system logs today. After changing the timeout back to 120s (see above) we eventually got more "hung task" messages. On Tuesday May 12th and May 19th, which happens to be the weekday when we dump a "zfs send" out to tape (along with the normal cross-site incremental zfs send / recv between systems). On 12 May there were five instances on the server running a full "zfs send" to tape as opposed to only one instance on a second server running a shorter duration incremental "zfs send" to tape. 19 May through today (21 May) there are 9 instances so far on a server that is still running (36 out of) 45 TB "full" zfs send. We may get one or two more entries in the system log before it finishes. Zero instances on 19 May for the other system, which only ran a comparatively small incremental zfs send to tape.) As before, both servers have continued to operate normally and there was no visible performance impact.

I can provide the stack traces if it's useful. They are pretty much identical to the ones above which contain "traverse_visitbp". Otherwise, I'm tempted to change my timeout back to 240s, which appears to be long enough to prevent my log files being spammed. =)

olw2005 commented 9 years ago

For posterity, a representative trace from this morning: May 21 07:02:27 ctc-san2 kernel: INFO: task spl_system_task:2051 blocked for more than 120 seconds. May 21 07:02:27 ctc-san2 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1 May 21 07:02:27 ctc-san2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 21 07:02:27 ctc-san2 kernel: spl_system_ta D 0000000000000009 0 2051 2 0x00000000 May 21 07:02:27 ctc-san2 kernel: ffff883018b6b3b0 0000000000000046 0000000000000000 ffff8828866a2e50 May 21 07:02:27 ctc-san2 kernel: ffffffffa03b4ec0 0000000000000000 000662168963a035 ffffffffa039ca62 May 21 07:02:27 ctc-san2 kernel: 0000000000020000 000000016b1085b5 ffff883017f2dad8 ffff883018b6bfd8 May 21 07:02:27 ctc-san2 kernel: Call Trace: May 21 07:02:27 ctc-san2 kernel: [] ? zio_vdev_io_done+0x42/0x190 [zfs] May 21 07:02:27 ctc-san2 kernel: [] cv_wait_common+0x11d/0x130 [spl] May 21 07:02:27 ctc-san2 kernel: [] ? autoremove_wake_function+0x0/0x40 May 21 07:02:27 ctc-san2 kernel: [] __cv_wait+0x15/0x20 [spl] May 21 07:02:27 ctc-san2 kernel: [] traverse_prefetcher+0xe7/0x1b0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x4a6/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_dnode+0x71/0xd0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x677/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_dnode+0x71/0xd0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x720/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs] May 21 07:02:27 ctc-san2 kernel: [] ? traverse_prefetcher+0x0/0x1b0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] taskq_thread+0x1e7/0x3f0 [spl] May 21 07:02:27 ctc-san2 kernel: [] ? default_wake_function+0x0/0x20 May 21 07:02:27 ctc-san2 kernel: [] ? taskq_thread+0x0/0x3f0 [spl] May 21 07:02:27 ctc-san2 kernel: [] kthread+0x9e/0xc0 May 21 07:02:27 ctc-san2 kernel: [] child_rip+0xa/0x20 May 21 07:02:27 ctc-san2 kernel: [] ? kthread+0x0/0xc0 May 21 07:02:27 ctc-san2 kernel: [] ? child_rip+0x0/0x20

solarjdp69 commented 9 years ago

Thanks Owl! I'm trying the 240 setting to see if it helps. I am also testing a newer github zfs version on a few servers to see if it alleviates the issues. It has has the patches that Bellendorf suggested would fix it.

On Thu, May 21, 2015 at 9:38 AM, olw2005 notifications@github.com wrote:

For posterity, a representative trace from this morning: May 21 07:02:27 ctc-san2 kernel: INFO: task spl_system_task:2051 blocked for more than 120 seconds. May 21 07:02:27 ctc-san2 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1 https://github.com/zfsonlinux/zfs/issues/1 May 21 07:02:27 ctc-san2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 21 07:02:27 ctc-san2 kernel: spl_system_ta D 0000000000000009 0 2051 2 0x00000000 May 21 07:02:27 ctc-san2 kernel: ffff883018b6b3b0 0000000000000046 0000000000000000 ffff8828866a2e50 May 21 07:02:27 ctc-san2 kernel: ffffffffa03b4ec0 0000000000000000 000662168963a035 ffffffffa039ca62 May 21 07:02:27 ctc-san2 kernel: 0000000000020000 000000016b1085b5 ffff883017f2dad8 ffff883018b6bfd8 May 21 07:02:27 ctc-san2 kernel: Call Trace: May 21 07:02:27 ctc-san2 kernel: [] ? zio_vdev_io_done+0x42/0x190 [zfs] May 21 07:02:27 ctc-san2 kernel: [] cv_wait_common+0x11d/0x130 [spl] May 21 07:02:27 ctc-san2 kernel: [] ? autoremove_wake_function+0x0/0x40 May 21 07:02:27 ctc-san2 kernel: [] __cv_wait+0x15/0x20 [spl] May 21 07:02:27 ctc-san2 kernel: [] traverse_prefetcher+0xe7/0x1b0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x4a6/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_dnode+0x71/0xd0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x677/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x41b/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_dnode+0x71/0xd0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_visitbp+0x720/0x800 [zfs] May 21 07:02:27 ctc-san2 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs] May 21 07:02:27 ctc-san2 kernel: [] ? traverse_prefetcher+0x0/0x1b0 [zfs] May 21 07:02:27 ctc-san2 kernel: [] taskq_thread+0x1e7/0x3f0 [spl] May 21 07:02:27 ctc-san2 kernel: [] ? default_wake_function+0x0/0x20 May 21 07:02:27 ctc-san2 kernel: [] ? taskq_thread+0x0/0x3f0 [spl] May 21 07:02:27 ctc-san2 kernel: [] kthread+0x9e/0xc0 May 21 07:02:27 ctc-san2 kernel: [] child_rip+0xa/0x20 May 21 07:02:27 ctc-san2 kernel: [] ? kthread+0x0/0xc0 May 21 07:02:27 ctc-san2 kernel: [] ? child_rip+0x0/0x20

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3376#issuecomment-104347309.

John Palys School Pathways, Inc. 866-200-6936 #2010 * Office*

alexanderhaensch commented 9 years ago

Im my case the patch workaround from @behlendorf works and there are no stalls until now. Fingers crossed.

solarjdp69 commented 9 years ago

Applied the github version and it still crashes but more data in logs relative to the crash.

From /var/log/messages - Entries end at hard lockup. Additional entries (not supplied) are for hard reset restart.

Appears to be referring to a disk / controller issue. The SAS cables, Rocketraid 2720SGL SAS controllers are all a few months. old. Power supply has been swapped previously.

May 31 08:10:25 hume auditd[2066]: Audit daemon rotating log files May 31 09:20:18 hume kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 000000F2, slot [0]. May 31 09:20:18 hume zed[1303]: Invoking "all-syslog.sh" eid=11 pid=28188 May 31 09:20:18 hume zed: eid=11 class=checksum pool=zfspool May 31 09:20:18 hume zed[1303]: Finished "all-syslog.sh" eid=11 pid=28188 exit=0 May 31 09:20:18 hume zed[1303]: Invoking "checksum-notify.sh" eid=11 pid=28190 May 31 09:20:18 hume zed[1303]: Finished "checksum-notify.sh" eid=11 pid=28190 exit=2 May 31 09:20:18 hume zed[1303]: Invoking "checksum-spare.sh" eid=11 pid=28210 May 31 09:20:18 hume zed[1303]: Finished "checksum-spare.sh" eid=11 pid=28210 exit=2 May 31 09:20:18 hume zed[1303]: Invoking "all-syslog.sh" eid=12 pid=28211 May 31 09:20:18 hume zed: eid=12 class=checksum pool=zfspool May 31 09:20:18 hume zed[1303]: Finished "all-syslog.sh" eid=12 pid=28211 exit=0 May 31 09:20:18 hume zed[1303]: Invoking "checksum-notify.sh" eid=12 pid=28213 May 31 09:20:18 hume zed[1303]: Finished "checksum-notify.sh" eid=12 pid=28213 exit=3 May 31 09:20:18 hume zed[1303]: Invoking "checksum-spare.sh" eid=12 pid=28226 May 31 09:20:18 hume zed[1303]: Finished "checksum-spare.sh" eid=12 pid=28226 exit=2 May 31 09:20:18 hume zed[1303]: Invoking "all-syslog.sh" eid=13 pid=28227 May 31 09:20:18 hume zed: eid=13 class=checksum pool=zfspool May 31 09:20:18 hume zed[1303]: Finished "all-syslog.sh" eid=13 pid=28227 exit=0 May 31 09:20:18 hume zed[1303]: Invoking "checksum-notify.sh" eid=13 pid=28229 May 31 09:20:18 hume zed[1303]: Finished "checksum-notify.sh" eid=13 pid=28229 exit=3 May 31 09:20:18 hume zed[1303]: Invoking "checksum-spare.sh" eid=13 pid=28242 May 31 09:20:18 hume zed[1303]: Finished "checksum-spare.sh" eid=13 pid=28242 exit=2 May 31 09:20:18 hume zed[1303]: Invoking "all-syslog.sh" eid=14 pid=28243 May 31 09:20:18 hume zed: eid=14 class=checksum pool=zfspool May 31 09:20:18 hume zed[1303]: Finished "all-syslog.sh" eid=14 pid=28243 exit=0 May 31 09:20:18 hume zed[1303]: Invoking "checksum-notify.sh" eid=14 pid=28245 May 31 09:20:18 hume zed[1303]: Finished "checksum-notify.sh" eid=14 pid=28245 exit=3 May 31 09:20:18 hume zed[1303]: Invoking "checksum-spare.sh" eid=14 pid=28258 May 31 09:20:18 hume zed[1303]: Finished "checksum-spare.sh" eid=14 pid=28258 exit=2 May 31 09:20:18 hume zed[1303]: Invoking "all-syslog.sh" eid=15 pid=28259 May 31 09:20:18 hume zed: eid=15 class=checksum pool=zfspool May 31 09:20:18 hume zed[1303]: Finished "all-syslog.sh" eid=15 pid=28259 exit=0 May 31 09:20:18 hume zed[1303]: Invoking "checksum-notify.sh" eid=15 pid=28261 May 31 09:20:18 hume zed[1303]: Finished "checksum-notify.sh" eid=15 pid=28261 exit=3 May 31 09:20:18 hume zed[1303]: Invoking "checksum-spare.sh" eid=15 pid=28274 May 31 09:20:18 hume zed[1303]: Finished "checksum-spare.sh" eid=15 pid=28274 exit=2 May 31 09:20:18 hume zed[1303]: Invoking "all-syslog.sh" eid=16 pid=28275 May 31 09:20:18 hume zed: eid=16 class=checksum pool=zfspool May 31 09:20:18 hume zed[1303]: Finished "all-syslog.sh" eid=16 pid=28275 exit=0 May 31 09:20:18 hume zed[1303]: Invoking "checksum-notify.sh" eid=16 pid=28277 May 31 09:20:18 hume zed[1303]: Finished "checksum-notify.sh" eid=16 pid=28277 exit=3 May 31 09:20:18 hume zed[1303]: Invoking "checksum-spare.sh" eid=16 pid=28290 May 31 09:20:18 hume zed[1303]: Finished "checksum-spare.sh" eid=16 pid=28290 exit=2 May 31 09:20:18 hume zed[1303]: Invoking "all-syslog.sh" eid=17 pid=28291 May 31 09:20:18 hume zed: eid=17 class=checksum pool=zfspool May 31 09:20:18 hume zed[1303]: Finished "all-syslog.sh" eid=17 pid=28291 exit=0 May 31 09:20:18 hume zed[1303]: Invoking "checksum-notify.sh" eid=17 pid=28293 May 31 09:20:48 hume kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 May 31 09:20:48 hume kernel: sas: trying to find task 0xffff880412b22a80 May 31 09:20:48 hume kernel: sas: sas_scsi_find_task: aborting task 0xffff880412b22a80 May 31 09:20:48 hume kernel: sas: sas_scsi_find_task: task 0xffff880412b22a80 is aborted May 31 09:20:48 hume kernel: sas: sas_eh_handle_sas_errors: task 0xffff880412b22a80 is aborted May 31 09:20:48 hume kernel: sas: ata11: end_device-2:3: cmd error handler May 31 09:20:48 hume kernel: sas: ata8: end_device-2:0: dev error handler May 31 09:20:48 hume kernel: sas: ata9: end_device-2:1: dev error handler May 31 09:20:48 hume kernel: sas: ata10: end_device-2:2: dev error handler May 31 09:20:48 hume kernel: sas: ata11: end_device-2:3: dev error handler May 31 09:20:48 hume kernel: ata11.00: exception Emask 0x0 SAct 0x4000 SErr 0x0 action 0x6 frozen May 31 09:20:48 hume kernel: ata11.00: failed command: READ FPDMA QUEUED May 31 09:20:48 hume kernel: ata11.00: cmd 60/08:00:a8:91:c7/00:00:18:01:00/40 tag 14 ncq 4096 in May 31 09:20:48 hume kernel: res 40/00:5c:60:45:c2/00:00:33:00:00/40 Emask 0x4 (timeout) May 31 09:20:48 hume kernel: ata11.00: status: { DRDY } May 31 09:20:48 hume kernel: ata11: hard resetting link May 31 09:20:48 hume kernel: sas: sas_form_port: phy3 belongs to port3 already(1)! May 31 09:20:50 hume kernel: drivers/scsi/mvsas/mv_sas.c 1528:mvs_I_T_nexus_reset for device[1]:rc= 0 May 31 09:20:51 hume kernel: ata11.00: configured for UDMA/133 May 31 09:20:51 hume kernel: sd 2:0:3:0: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE May 31 09:20:51 hume kernel: sd 2:0:3:0: [sdk] Sense Key : Aborted Command [current] [descriptor] May 31 09:20:51 hume kernel: Descriptor sense data with sense descriptors (in hex): May 31 09:20:51 hume kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 May 31 09:20:51 hume kernel: 00 00 00 5f May 31 09:20:51 hume kernel: sd 2:0:3:0: [sdk] Add. Sense: No additional sense information May 31 09:20:51 hume kernel: sd 2:0:3:0: [sdk] CDB: Read(16): 88 00 00 00 00 01 18 c7 91 a8 00 00 00 08 00 00 May 31 09:20:51 hume kernel: end_request: I/O error, dev sdk, sector 4710699432 May 31 09:20:51 hume kernel: ata11: EH complete May 31 09:20:51 hume kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 May 31 09:20:51 hume zed[1303]: Invoking "all-syslog.sh" eid=18 pid=28785 May 31 09:20:51 hume zed: eid=18 class=delay pool=zfspool May 31 09:20:51 hume zed[1303]: Finished "all-syslog.sh" eid=18 pid=28785 exit=0 May 31 09:20:51 hume zed[1303]: Invoking "all-syslog.sh" eid=19 pid=28787 May 31 09:20:51 hume zed: eid=19 class=io pool=zfspool May 31 09:20:51 hume zed[1303]: Finished "all-syslog.sh" eid=19 pid=28787 exit=0 May 31 09:20:51 hume zed[1303]: Invoking "io-notify.sh" eid=19 pid=28789 May 31 09:20:51 hume zed[1303]: Finished "io-notify.sh" eid=19 pid=28789 exit=2 May 31 09:20:51 hume zed[1303]: Invoking "io-spare.sh" eid=19 pid=28809 May 31 09:20:51 hume zed[1303]: Finished "io-spare.sh" eid=19 pid=28809 exit=2

I've seen some refernces relative to a resume from RAM issue and also references to the drive Typw I.E; Western Digital.

Any ideas are welcome.

On Thu, May 21, 2015 at 1:19 PM, alexanderhaensch notifications@github.com wrote:

Im my case the patch workaround from @behlendorf https://github.com/behlendorf works and there are no stalls until now. Fingers crossed.

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3376#issuecomment-104408996.

John Palys School Pathways, Inc. 866-200-6936 #2010 * Office*

solarjdp69 commented 9 years ago

UPDATED INFORMATION. I've received abrt notice with some coredump info and other system information. I have a tagz file of the directory which contains a coredump, etc.. from directory /var/spool/abrt/ccpp-2015-05-25-11:34:10-31612 Contact me if you would like this file.

---------- Forwarded message ---------- From: ABRT Daemon DoNotReply@bigboy-joliot.schoolpathways.com To: root@localhost.schoolpathways.com Cc: Date: Tue, 02 Jun 2015 18:10:28 -0700 Subject: [abrt] a crash has been detected again abrt_version: 2.0.8 cgroup: cmdline: sh -c '/sbin/zfs send -v -i zfshegelspfs/spfs.333@snapshot.20140529 zfshegelspfs/spfs.333@snapshot.20140530 | /root/bin/std/relentlesspipe.php send joliotbbspfs333 chunksize=100000000' executable: /bin/bash hostname: bigboy-joliot.schoolpathways.com kernel: 2.6.32-504.16.2.el6.x86_64 last_occurrence: 1433293798 pid: 31612 pwd: /root time: Mon 25 May 2015 11:34:10 AM PDT uid: 0 username: root var_log_messages: May 25 11:34:11 bigboy-joliot abrt[19857]: Saved core dump of pid 31612 (/bin/bash) to /var/spool/abrt/ccpp-2015-05-25-11:34:10-31612 (413696 bytes)

dso_list: :/lib64/ld-2.12.so glibc-2.12-1.149.el6_6.7.x86_64 (CentOS) 1431676818 :/bin/bash bash-4.1.2-29.el6.x86_64 (CentOS) 1431673743 :/usr/lib/locale/locale-archive glibc-common-2.12-1.149.el6_6.7.x86_64 (CentOS) 1431676826 :/usr/lib64/gconv/gconv-modules.cache glibc-2.12-1.149.el6_6.7.x86_64 (CentOS) 1431676818 :/lib64/libtinfo.so.5.7 ncurses-libs-5.7-3.20090208.el6.x86_64 (CentOS) 1431673743 :/lib64/libc-2.12.so glibc-2.12-1.149.el6_6.7.x86_64 (CentOS) 1431676818 :/lib64/libdl-2.12.so glibc-2.12-1.149.el6_6.7.x86_64 (CentOS) 1431676818

environ: :SHELL=/bin/bash :'SSH_CLIENT=192.168.3.171 42152 59595' :QTDIR=/usr/lib64/qt-3.3 :QTINC=/usr/lib64/qt-3.3/include :USER=root :MAIL=/var/mail/root :PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin :PWD=/root :LANG=en_US.UTF-8 :MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles :LOADEDMODULES= :SHLVL=1 :HOME=/root :LANGUAGE= :LOGNAME=root :QTLIB=/usr/lib64/qt-3.3/lib :CVS_RSH=ssh :'SSH_CONNECTION=192.168.3.171 42152 192.168.3.172 59595' :MODULESHOME=/usr/share/Modules :'LESSOPEN=||/usr/bin/lesspipe.sh %s' :G_BROKEN_FILENAMES=1 :'BASH_FUNCmodule()=() { eval /usr/bin/modulecmd bash $*\n}' :=/root/bin/get.snaps.php

limits: :Limit Soft Limit Hard Limit Units :Max cpu time unlimited unlimited seconds :Max file size unlimited unlimited bytes :Max data size unlimited unlimited bytes :Max stack size 10485760 unlimited bytes :Max core file size 0 unlimited bytes :Max resident set unlimited unlimited bytes :Max processes 256466 256466 processes :Max open files 1024 4096 files :Max locked memory 65536 65536 bytes :Max address space unlimited unlimited bytes :Max file locks unlimited unlimited locks :Max pending signals 256466 256466 signals :Max msgqueue size 819200 819200 bytes :Max nice priority 0 0 :Max realtime priority 0 0 :Max realtime timeout unlimited unlimited us

maps: :00400000-004d4000 r-xp 00000000 fd:00 2228269 /bin/bash :006d3000-006dd000 rw-p 000d3000 fd:00 2228269 /bin/bash :006dd000-006e2000 rw-p 00000000 00:00 0 :008dc000-008e5000 rw-p 000dc000 fd:00 2228269 /bin/bash :00f24000-00f45000 rw-p 00000000 00:00 0 [heap] :30a7400000-30a7420000 r-xp 00000000 fd:00 1179654 /lib64/ld-2.12.so :30a761f000-30a7620000 r--p 0001f000 fd:00 1179654 /lib64/ld-2.12.so :30a7620000-30a7621000 rw-p 00020000 fd:00 1179654 /lib64/ld-2.12.so :30a7621000-30a7622000 rw-p 00000000 00:00 0 :30a7800000-30a7802000 r-xp 00000000 fd:00 1179672 /lib64/libdl-2.12.so :30a7802000-30a7a02000 ---p 00002000 fd:00 1179672 /lib64/libdl-2.12.so :30a7a02000-30a7a03000 r--p 00002000 fd:00 1179672 /lib64/libdl-2.12.so :30a7a03000-30a7a04000 rw-p 00003000 fd:00 1179672 /lib64/libdl-2.12.so :30a7c00000-30a7d8a000 r-xp 00000000 fd:00 1179655 /lib64/libc-2.12.so :30a7d8a000-30a7f8a000 ---p 0018a000 fd:00 1179655 /lib64/libc-2.12.so :30a7f8a000-30a7f8e000 r--p 0018a000 fd:00 1179655 /lib64/libc-2.12.so :30a7f8e000-30a7f8f000 rw-p 0018e000 fd:00 1179655 /lib64/libc-2.12.so :30a7f8f000-30a7f94000 rw-p 00000000 00:00 0 :30aa400000-30aa41d000 r-xp 00000000 fd:00 1179901 /lib64/libtinfo.so.5.7 :30aa41d000-30aa61d000 ---p 0001d000 fd:00 1179901 /lib64/libtinfo.so.5.7 :30aa61d000-30aa621000 rw-p 0001d000 fd:00 1179901 /lib64/libtinfo.so.5.7 :7fda8c752000-7fda925e3000 r--p 00000000 fd:00 1311979 /usr/lib/locale/locale-archive :7fda925e3000-7fda925e6000 rw-p 00000000 00:00 0 :7fda925eb000-7fda925f2000 r--s 00000000 fd:00 1443098 /usr/lib64/gconv/gconv-modules.cache :7fda925f2000-7fda925f3000 rw-p 00000000 00:00 0 :7ffff8cc5000-7ffff8cda000 rw-p 00000000 00:00 0 [stack] :7ffff8de2000-7ffff8de3000 r-xp 00000000 00:00 0 [vdso] :ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

open_fds: :0:/dev/null :pos: 0 :flags: 0100000 :1:pipe:[77355297] :pos: 0 :flags: 01 :2:/tmp/snaps.joliotbbspfs333.err :pos: 15327054 :flags: 0100001 :3:/tmp/get.spfs.snaps.txt :pos: 13664 :flags: 0102001

alexanderhaensch commented 9 years ago

I can report that another system has the same problem to some lower extend. I recognized the problem with a slow scrub running now since 10 days. The culprit seems again [arc_adapt] running in STATE S< and consumed by far the most CPU time of all processes on the system. For that system, i do not have any stalls in the dmesg or any crashing programs. The kernel is the same, with the difference that this system does not export any iscsi or TCM block-devices. Moreover, this system has no L2ARC drives. Don't know if that matters.

Is it possible to debug the process while running?

solarjdp69 commented 9 years ago

We reverted back to the latest 6,3 release to get some stability back. Both servers were panicing with resulting hard freeze. Cold start required. Other issues were that sometimes drive(s) status would go bad such as disks going offline, and smart status unavailable. Fdisk -l of affected disks instead of showing correct size - I.E; 4tb, would show 600,000,000 petabytes. Re-starting the server would allow disks to be seen correctly again and re-silvering would start and complete OK. These disks are newer HGST 4tb 7200 rpm and some older WD 3tb 5400 rpm disks.

Since reverting back all symptoms have gone away.

rpm -qa |grep zfs

zfs-dkms-0.6.3-1.3.el6.noarch zfs-release-1-4.el6.noarch libzfs2-0.6.3-1.2.el6.x86_64 zfs-0.6.3-1.2.el6.x86_64

On Tue, Jun 23, 2015 at 1:02 AM, alexanderhaensch notifications@github.com wrote:

I can report that another system has the same problem to some lower extend. I recognized the problem with a slow scrub running now since 10 days. The culprit seems again [arc_adapt] running in STATE S< and consumed by far the most CPU time of all processes on the system. For that system, i do not have any stalls in the dmesg or any crashing programs. The kernel is the same, with the difference that this system does not export any iscsi or TCM block-devices.

Is it possible to debug the process while running?

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3376#issuecomment-114397310.

John Palys School Pathways, Inc. 866-200-6936 #2010 * Office*

alexanderhaensch commented 9 years ago

I updated to 0.6.4.2 on the secondary system, where the effect was a slowdown and i think the situation is improved. Scrub is running in parrallel to send/rescieve and rsync and everything works fine. Lets wait some days, if the slowdown comes back.

alexanderhaensch commented 9 years ago

Now a different thing happens on the system. Unfortunatly debug flags are disabled so i can only send perf top"

The situtation looks very similar to: #3697
perf top

PerfTop: 57212 irqs/sec kernel:99.6% exact: 0.0% [4000Hz cycles], (all, 24 CPUs)

69.01%  [kernel]          [k] _raw_spin_lock_irqsave         
 3.54%  [kernel]          [k] _raw_spin_trylock              
 3.10%  [kernel]          [k] up_read                        
 3.03%  [kernel]          [k] down_read                      
 2.34%  [kernel]          [k] __mutex_lock_slowpath          
 1.53%  [kernel]          [k] __mutex_unlock_slowpath        
 0.92%  [kernel]          [k] _raw_spin_unlock_irqrestore    
 0.90%  [zfs]             [k] 0x0000000000114c38             
 0.86%  [zfs]             [k] 0x00000000001ba470             
 0.77%  [kernel]          [k] _raw_spin_lock                 
 0.48%  [zfs]             [k] 0x000000000011771b             
 0.44%  [spl]             [k] 0x0000000000003d4f             
 0.39%  [zfs]             [k] 0x0000000000114c30             
 0.39%  [kernel]          [k] acpi_processor_ffh_cstate_enter
 0.38%  [zfs]             [k] 0x0000000000115f8f             
 0.38%  [kernel]          [k] memset                         
 0.35%  [zfs]             [k] 0x000000000012fa2a             
 0.34%  [zfs]             [k] 0x000000000012f81b             
 0.33%  [zfs]             [k] 0x00000000001ba44e             
 0.32%  [kernel]          [k] _raw_spin_lock_irq             
 0.31%  [zfs]             [k] 0x000000000012d93c             
 0.30%  [zfs]             [k] 0x0000000000114bc4             
 0.27%  [zfs]             [k] 0x000000000012d937             
 0.25%  [zfs]             [k] 0x000000000012da48             
 0.24%  [kernel]          [k] mutex_lock                     
 0.21%  [spl]             [k] 0x0000000000003e4c             
 0.20%  [kernel]          [k] __switch_to                    
 0.17%  [kernel]          [k] rwsem_down_read_failed         
 0.16%  [zfs]             [k] 0x000000000012e351             
 0.16%  [kernel]          [k] __kmalloc_node                 
 0.15%  [kernel]          [k] __schedule                     
 0.15%  libz.so.1.2.8     [.] 0x00000000000031e0             
 0.14%  [zfs]             [k] 0x000000000012da2e             
 0.13%  [zfs]             [k] 0x00000000001ba45d             
 0.13%  [kernel]          [k] kfree                          
 0.12%  [kernel]          [k] lapic_next_deadline            
 0.12%  [zfs]             [k] 0x000000000012da08             
 0.11%  [zfs]             [k] 0x0000000000116d10             
 0.11%  [zfs]             [k] 0x000000000014c700   
alexanderhaensch commented 9 years ago

In the end access to ext4 partition was not possible anymore.

ki82 commented 9 years ago

Is there any plans to make a new release with the patch described by @behlendorf in this issue? If the patch is better then the current state then I think it could be released and then removed later on.

I would like to avoid applying patches to our single server in production where we experience similar problems, described in detailed in #3697

behlendorf commented 9 years ago

@ki82 the 0.6.4.2 tag contained a proper fix so this specific issue shouldn't be an issue there. The master code contains that fix and several additional improvements.

Issue #3697 looks subtly different than this although the work around provided in the comment above could help for similar reasons. In the master code you'll be able to set zfs_arc_meta_prune=0 module option to achieve the same effect without patching the code.

@alexanderhaensch how do you feel about closing this specific issue out and moving discussion to #3697. It sounds like 0.6.4.2 did resolve your original issue.

alexanderhaensch commented 9 years ago

@behlendorf I think it is okay to close this issue. Since 0.6.4.2 the effect of the underlying issue is different and it does not log into dmesg anymore.

behlendorf commented 9 years ago

@alexanderhaensch OK, thanks.