Closed johnkeates closed 3 years ago
@johnkeates Are you by any chance using zfs_deadman_failmode=continue
?
Nope, not using that AFAIK, unless it’s default. It’s a pretty simple and plain setup with all defaults; running three two-disk mirror pools, dedup off, compression lz4, root on zfs.
@johnkeates Might you have the event log (leading up to the hang) available? If you're running zed, it should log at least one line per event via syslog. Or, more ideally, the output of zpool events
and/or zpool events -v
.
Sep 21 2018 13:29:32.013175881 sysevent.fs.zfs.history_event
Sep 21 2018 13:29:32.309175867 sysevent.fs.zfs.config_sync
Sep 21 2018 13:29:32.309175867 sysevent.fs.zfs.pool_import
Sep 21 2018 13:29:32.313175866 sysevent.fs.zfs.history_event
Sep 21 2018 13:29:32.577175853 sysevent.fs.zfs.config_sync
Sep 21 2018 13:29:44.113175286 sysevent.fs.zfs.history_event
Sep 21 2018 13:29:44.213175281 sysevent.fs.zfs.config_sync
Sep 21 2018 13:29:44.213175281 sysevent.fs.zfs.pool_import
Sep 21 2018 13:29:44.217175281 sysevent.fs.zfs.history_event
Sep 21 2018 13:29:44.265175279 sysevent.fs.zfs.config_sync
Sep 21 2018 13:29:45.749175206 sysevent.fs.zfs.history_event
Sep 21 2018 13:29:46.109175188 sysevent.fs.zfs.config_sync
Sep 21 2018 13:29:46.109175188 sysevent.fs.zfs.pool_import
Sep 21 2018 13:29:46.113175188 sysevent.fs.zfs.history_event
Sep 21 2018 13:29:46.417175173 sysevent.fs.zfs.config_sync
Sep 21 2018 13:29:44.113175286 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "fastpool"
pool_guid = 0xf543bba1bec227e
pool_state = 0x0
pool_context = 0x0
history_hostname = "xen-1-prod"
history_internal_str = "pool version 5000; software version 5000/5; uts xen-1-prod 4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 (2018-08-27) x86_64"
history_internal_name = "open"
history_txg = 0x1245a32f
history_time = 0x5ba4d628
time = 0x5ba4d628 0x6beeaf6
eid = 0x6
Sep 21 2018 13:29:44.213175281 sysevent.fs.zfs.config_sync
version = 0x0
class = "sysevent.fs.zfs.config_sync"
pool = "fastpool"
pool_guid = 0xf543bba1bec227e
pool_state = 0x0
pool_context = 0x0
time = 0x5ba4d628 0xcb4cbf1
eid = 0x7
Sep 21 2018 13:29:44.213175281 sysevent.fs.zfs.pool_import
version = 0x0
class = "sysevent.fs.zfs.pool_import"
pool = "fastpool"
pool_guid = 0xf543bba1bec227e
pool_state = 0x0
pool_context = 0x0
time = 0x5ba4d628 0xcb4cbf1
eid = 0x8
Sep 21 2018 13:29:44.217175281 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "fastpool"
pool_guid = 0xf543bba1bec227e
pool_state = 0x0
pool_context = 0x0
history_hostname = "xen-1-prod"
history_internal_str = "pool version 5000; software version 5000/5; uts xen-1-prod 4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 (2018-08-27) x86_64"
history_internal_name = "import"
history_txg = 0x1245a331
history_time = 0x5ba4d628
time = 0x5ba4d628 0xcf1d4f1
eid = 0x9
Sep 21 2018 13:29:44.265175279 sysevent.fs.zfs.config_sync
version = 0x0
class = "sysevent.fs.zfs.config_sync"
pool = "fastpool"
pool_guid = 0xf543bba1bec227e
pool_state = 0x0
pool_context = 0x0
time = 0x5ba4d628 0xfce40ef
eid = 0xa
I'll check to see if I can get earlier messages, those are after the fact... argh
Can't seem to get logs before the system reboot, are they stored in a different location? kern.log has only this on ZFS & pool:
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 32.716006] ZFS: Loaded module v0.7.9-3~bpo9+1, ZFS pool version 5000, ZFS filesystem version 5
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 32.970142] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000039fe6c3d56b-part1 (sdh): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 32.970738] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000039fe6cafb24-part1 (sdc): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 33.374010] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000039fe6cafb24-part1 (sdc): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 33.377180] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000039fe6c3d56b-part1 (sdh): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 45.483119] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x55cd2e414da94592-part1 (sdb): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 45.483137] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x55cd2e414da7dc87-part1 (sda): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 45.563873] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000cca37cef28f1-part1 (sdg): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 45.563922] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000cca37ce9cec7-part1 (sdd): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 46.216186] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x55cd2e414da7dc87-part1 (sda): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 46.218252] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x55cd2e414da94592-part1 (sdb): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 46.636142] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000cca37cef28f1-part1 (sdg): 256
./kern.log:Sep 21 13:29:49 xen-1-prod kernel: [ 46.636161] ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/wwn-0x5000cca37ce9cec7-part1 (sdd): 256
@johnkeates zed helpfully doesn't persist across reboots, because why would you ever want to keep logs.
grepping for zed in syslog/messages/... might log at least the short form of the events.
Yeah, that's what I did, and basically all the messages I got were the same as the ones logged in kern.log.
I wonder why there is no Zed log persistence, seems like logging 101...
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the problem you're observing
Performance drops then hangs
Describe how to reproduce the problem
Happens once every few weeks
Include any warning/errors/backtraces from the system logs