openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.61k stars 1.75k forks source link

PANIC at dnode.c:715:dnode_reallocate() / Crash - interrupt took too long #8540

Closed Harvie closed 5 years ago

Harvie commented 5 years ago

Hi! I am not sure how to debug this. Any ideas?

filename: /lib/modules/4.15.18-12-pve/zfs/zfs.ko version: 0.7.13-pve1~bpo1

image

behlendorf commented 5 years ago

@Harvie are you able to consistently reproduce this issue or is it intermittent? When it next occurs you're going to want to log in to the system and run dmesg to dump the console logs. If you can attach them to this issue we can take a look and may be able to determine why the zfs recv is hung.

A similar hang was reported in #8174 which ended up being caused by an issue with jumbo frames. You may want to double check everything is in order with your network.

Harvie commented 5 years ago

It's intermittent.

When it next occurs you're going to want to log in to the system and run dmesg to dump the console logs.

I cannot log to the system, since it's completely frozen. Does not respond to keyboard, does not ping.

You may want to double check everything is in order with your network.

Are you telling me that when packets get corrupted (or probably just lost) on their way through network, this can lead to kernel lockup/panic by design? what? why? can't it just gracefuly timeout or something?

stuartabramshumphries commented 5 years ago

having exactly the same issue. Just upgraded from ubuntu 1604 to 1804 and problem started, host stable until we do zfs recv - then box actuall has a bsod - one time we got same message as above. very reproducable - host wont stay up longer than 10minutes if we run zfs receive. all working ok on the 1604 nodes we have, also not using Jumbo frames. ran a long memtest on server and ok, also zpool scrub fine, also smartctl long test ok..no core dumps or anything in messages - we do see memory use shooting up prior to the bsod. any ideas gratefully received.

Harvie commented 5 years ago

I've upgraded to zfs 0.8 from git master and so far no crash. But can't really say yet. Before it was running with 0.7.13 for few hours as well...

Harvie commented 5 years ago

I've also upgraded production server, which is under heavier io load to 0.8.0-rc3_126_g5dbf8b4ed and i get crash each hour or two... Similar to what i was geting with 0.7.13...

I've found there is enormous number of reported problems about ZFS locking up under io load across whole internet in recent 10 years. It seem to affect wide spectrum of ZFS versions. Including old 0.6.x, stable 0.7.x and 0.8.x rc. Some of the posts suggested that i should use kernel with CONFIG_PREEMPT_VOLUNTARY=y, but i've checked and i do use it.

It's strange that this happens suddenly without any performance degradation. Everything runs smooth and suddenly it goes to kernel lockup and panic because of ZFS. I have 70G RAM and use 32G zfs_arc_max. 24 cpu threads.

More io load there is, the more likely it's to crash. (during peak hours the server crashes more frequently, during night only one or two times). Unfortunately the error messages are different each time.

I am pulling my hair for months with this one...

Harvie commented 5 years ago

Another crash:

[21087.917164] INFO: rcu_sched self-detected stall on CPU
[21087.917167] INFO: rcu_sched self-detected stall on CPU
[21087.917177]  11-....: (14999 ticks this GP) idle=72a/140000000000001/0 softirq=1937756/1937756 fqs=7429 
[21087.917182]  16-....: (14999 ticks this GP) idle=662/140000000000001/0 softirq=1793101/1793101 fqs=7429 
[21087.917183]  
[21087.917185]   (t=15000 jiffies g=1338404 c=1338403 q=333215)
[21087.917191]  (t=15000 jiffies g=1338404 c=1338403 q=333215)
[21087.917194] Sending NMI from CPU 16 to CPUs 2:
[21087.917201] NMI backtrace for cpu 2
[21087.917204] CPU: 2 PID: 808997 Comm: fail2ban-server Tainted: P          IO     4.15.18-11-pve #1
[21087.917205] Hardware name: HP ProLiant DL160 G6  , BIOS O33 07/01/2013
[21087.917211] RIP: 0010:smp_call_function_many+0x1f9/0x260
[21087.917211] RSP: 0000:ffffa1a10b117b18 EFLAGS: 00000202
[21087.917213] RAX: 0000000000000009 RBX: ffff8bbd1baa3900 RCX: ffff8bc61b2e89c0
[21087.917214] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8bbd1b444828
[21087.917215] RBP: ffffa1a10b117b50 R08: fffffffffffffe00 R09: 0000000000fffffb
[21087.917215] R10: ffffedf71b4c5440 R11: ffff8bbb3d049ff4 R12: ffffffffb767b810
[21087.917216] R13: ffffa1a10b117b88 R14: 0000000000000001 R15: 0000000000000018
[21087.917218] FS:  00007f31d80d7700(0000) GS:ffff8bbd1ba80000(0000) knlGS:0000000000000000
[21087.917218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21087.917219] CR2: 00007f31d6d52d34 CR3: 0000000f1d5a6006 CR4: 00000000000206e0
[21087.917220] Call Trace:
[21087.917225]  ? x86_configure_nx+0x50/0x50
[21087.917227]  on_each_cpu+0x2d/0x60
[21087.917229]  flush_tlb_kernel_range+0x79/0x80
[21087.917231]  unmap_kernel_range+0x26/0x30
[21087.917235]  zs_unmap_object+0x9d/0xd0
[21087.917240]  zram_bvec_rw.constprop.32+0x4f0/0x930 [zram]
[21087.917243]  zram_rw_page+0x9e/0xf3 [zram]
[21087.917245]  bdev_read_page+0x8f/0xc0
[21087.917247]  swap_readpage+0x54/0x1e0
[21087.917249]  do_swap_page+0x599/0x960
[21087.917250]  ? do_wp_page+0x10b/0x4f0
[21087.917251]  __handle_mm_fault+0x847/0x11c0
[21087.917253]  handle_mm_fault+0xce/0x1b0
[21087.917256]  __do_page_fault+0x25e/0x500
[21087.917258]  ? syscall_trace_enter+0xca/0x2e0
[21087.917260]  do_page_fault+0x2e/0xe0
[21087.917263]  ? page_fault+0x2f/0x50
[21087.917264]  page_fault+0x45/0x50
[21087.917266] RIP: 0033:0x55af839a005d
[21087.917266] RSP: 002b:00007ffef2cb2d40 EFLAGS: 00010202
[21087.917268] RAX: 000055af83ebfd50 RBX: 00007f31d6d1e2f0 RCX: 00007f31d434c0d0
[21087.917268] RDX: 0000000000000000 RSI: 00007f31d6d256c8 RDI: 00007f31d6d52d20
[21087.917269] RBP: 00007f31d434c0b8 R08: 0000000000000001 R09: 0000000000000000
[21087.917270] R10: 000055af83ea3ca0 R11: 000055af83ea3c40 R12: 000055af84a29600
[21087.917271] R13: 0000000000000001 R14: 0000000000000000 R15: 00007f31d6d1e2f0
[21087.917272] Code: 89 c7 e8 fb 6e 88 00 3b 05 49 f1 53 01 0f 83 8f fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 c0 86 7a b8 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c7 0f b6 4d cc 4c 89 ea 4c 89 e6 44 
[21087.918197] Sending NMI from CPU 16 to CPUs 9:
[21087.918203] NMI backtrace for cpu 9
[21087.918207] CPU: 9 PID: 807556 Comm: opendkim Tainted: P          IO     4.15.18-11-pve #1
[21087.918208] Hardware name: HP ProLiant DL160 G6  , BIOS O33 07/01/2013
[21087.918215] RIP: 0010:zram_slot_free_notify+0x54/0x70 [zram]
[21087.918216] RSP: 0000:ffffa1a109503c58 EFLAGS: 00000006
[21087.918218] RAX: ffffa1a13efc1cb0 RBX: 0000000006758cb0 RCX: ffffa1a13efc1cb8
[21087.918219] RDX: 0000000002000438 RSI: 00000000006758cb RDI: ffff8bbcf393b740
[21087.918220] RBP: ffffa1a109503c68 R08: ffff8bbc7128a480 R09: 0000000000000000
[21087.918221] R10: 0000000000000040 R11: ffff8bbc7128a4a8 R12: ffff8bbcf43a5a00
[21087.918222] R13: ffffffffc04aea70 R14: 00000000006758cb R15: ffff8bbcea040000
[21087.918224] FS:  00007fc510581700(0000) GS:ffff8bc61b2c0000(0000) knlGS:0000000000000000
[21087.918226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21087.918227] CR2: 00007fc51783f205 CR3: 000000113aed4006 CR4: 00000000000206e0
[21087.918228] Call Trace:
[21087.918235]  swap_range_free+0xa1/0xf0
[21087.918238]  swapcache_free_entries+0xf7/0x1f0
[21087.918243]  free_swap_slot+0xd6/0xf0
[21087.918245]  swap_free+0x36/0x40
[21087.918247]  do_swap_page+0x71e/0x960
[21087.918250]  __handle_mm_fault+0x847/0x11c0
[21087.918253]  ? do_futex+0x2db/0xb70
[21087.918256]  handle_mm_fault+0xce/0x1b0
[21087.918259]  __do_page_fault+0x25e/0x500
[21087.918261]  ? SyS_futex+0x83/0x180
[21087.918263]  do_page_fault+0x2e/0xe0
[21087.918266]  ? page_fault+0x2f/0x50
[21087.918268]  page_fault+0x45/0x50
[21087.918270] RIP: 0033:0x7fc5176343a3
[21087.918271] RSP: 002b:00007fc510580aa0 EFLAGS: 00010246
[21087.918272] RAX: 00007fc51783eef0 RBX: 00007fc51783eef0 RCX: 0000000000000073
[21087.918273] RDX: 0000000000000000 RSI: 00007fc51783ee88 RDI: 00007fc514b7d268
[21087.918275] RBP: 0000000000000004 R08: 00007fc510580bf0 R09: 00007fc517850428
[21087.918276] R10: 00007fc51783b890 R11: 0000000000000206 R12: 0000000000000023
[21087.918277] R13: 0000000000000000 R14: 00007fc517835c10 R15: 0000000000000000
[21087.918278] Code: ba 68 08 19 72 1f 4c 89 e7 e8 f9 f4 ff ff 49 03 1c 24 f0 80 63 0b fd f0 49 ff 84 24 80 00 00 00 5b 41 5c 5d c3 48 8d 48 08 f3 90 <48> 8b 50 08 f7 c2 00 00 00 02 75 f2 f0 0f ba 29 19 73 c8 eb e9 
[21087.919198] Sending NMI from CPU 16 to CPUs 11:
[21087.919205] NMI backtrace for cpu 11
[21087.919208] CPU: 11 PID: 807555 Comm: opendkim Tainted: P          IO     4.15.18-11-pve #1
[21087.919209] Hardware name: HP ProLiant DL160 G6  , BIOS O33 07/01/2013
[21087.919214] RIP: 0010:native_queued_spin_lock_slowpath+0x177/0x1a0
[21087.919215] RSP: 0000:ffff8bc61b343df8 EFLAGS: 00000002
[21087.919217] RAX: 0000000000200101 RBX: 0000000000000083 RCX: 0000000000000001
[21087.919218] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffffffb8aa9400
[21087.919219] RBP: ffff8bc61b343df8 R08: 0000000000000101 R09: 00000000000030b8
[21087.919220] R10: 0140000000000001 R11: 00000000000030b8 R12: ffffffffb8c5fee0
[21087.919222] R13: ffffffffb8aa9240 R14: 0000000000000000 R15: ffffffffb8c5fee0
[21087.919223] FS:  00007fc510d82700(0000) GS:ffff8bc61b340000(0000) knlGS:0000000000000000
[21087.919225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21087.919226] CR2: 00007fc517835c10 CR3: 000000113aed4004 CR4: 00000000000206e0
[21087.919227] Call Trace:
[21087.919228]  <IRQ>
[21087.919231]  _raw_spin_lock_irqsave+0x37/0x40
[21087.919235]  rcu_dump_cpu_stacks+0x50/0xd6
[21087.919237]  rcu_check_callbacks+0x7c4/0x920
[21087.919239]  ? tick_sched_do_timer+0x50/0x50
[21087.919243]  update_process_times+0x2f/0x60
[21087.919245]  tick_sched_handle+0x2a/0x60
[21087.919247]  tick_sched_timer+0x39/0x80
[21087.919248]  __hrtimer_run_queues+0xe7/0x220
[21087.919250]  hrtimer_interrupt+0xa3/0x1e0
[21087.919253]  smp_apic_timer_interrupt+0x6f/0x130
[21087.919255]  apic_timer_interrupt+0x84/0x90
[21087.919255]  </IRQ>
[21087.919258] RIP: 0010:smp_call_function_many+0x1f9/0x260
[21087.919259] RSP: 0000:ffffa1a10844fb18 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
[21087.919260] RAX: 0000000000000009 RBX: ffff8bc61b363900 RCX: ffff8bc61b2e8d60
[21087.919261] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8bc61ac01cc0
[21087.919262] RBP: ffffa1a10844fb50 R08: fffffffffffffe00 R09: 0000000000fff7ff
[21087.919264] R10: ffffedf71b276780 R11: ffff8bc5af8dbff4 R12: ffffffffb767b810
[21087.919265] R13: ffffa1a10844fb88 R14: 0000000000000001 R15: 0000000000000018
[21087.919267]  ? x86_configure_nx+0x50/0x50
[21087.919270]  ? smp_call_function_many+0x1d5/0x260
[21087.919272]  ? x86_configure_nx+0x50/0x50
[21087.919274]  on_each_cpu+0x2d/0x60
[21087.919276]  flush_tlb_kernel_range+0x79/0x80
[21087.919278]  unmap_kernel_range+0x26/0x30
[21087.919282]  zs_unmap_object+0x9d/0xd0
[21087.919286]  zram_bvec_rw.constprop.32+0x4f0/0x930 [zram]
[21087.919290]  zram_rw_page+0x9e/0xf3 [zram]
[21087.919292]  bdev_read_page+0x8f/0xc0
[21087.919294]  swap_readpage+0x54/0x1e0
[21087.919296]  do_swap_page+0x599/0x960
[21087.919298]  __handle_mm_fault+0x847/0x11c0
[21087.919300]  ? do_futex+0x934/0xb70
[21087.919303]  handle_mm_fault+0xce/0x1b0
[21087.919305]  __do_page_fault+0x25e/0x500
[21087.919307]  ? SyS_futex+0x83/0x180
[21087.919309]  do_page_fault+0x2e/0xe0
[21087.919311]  ? page_fault+0x2f/0x50
[21087.919313]  page_fault+0x45/0x50
[21087.919314] RIP: 0033:0x7fc517634380
[21087.919315] RSP: 002b:00007fc510d81aa0 EFLAGS: 00010246
[21087.919317] RAX: 00007fc510d81b18 RBX: 00007fc51783bbe8 RCX: 00007fc514b7a218
[21087.919318] RDX: 00007fc510d81be0 RSI: 0000000031125a5c RDI: 00007fc514b7c0b9
[21087.919319] RBP: 0000000000000000 R08: 00007fc510d81bf0 R09: 00007fc517850428
[21087.919320] R10: 00007fc51783b890 R11: 0000000000000283 R12: 0000000000000023
[21087.919321] R13: 0000000000000000 R14: 00007fc517835c10 R15: 0000000000000000
[21087.919323] Code: 31 c0 41 39 c0 74 e6 4d 85 c9 c6 07 01 74 30 41 c7 41 08 01 00 00 00 e9 52 ff ff ff 83 fa 01 0f 84 b0 fe ff ff 8b 07 84 c0 74 08 <f3> 90 8b 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4c 
[21087.920200] NMI backtrace for cpu 16
[21087.920206] CPU: 16 PID: 809401 Comm: apache2 Tainted: P          IO     4.15.18-11-pve #1
[21087.920207] Hardware name: HP ProLiant DL160 G6  , BIOS O33 07/01/2013
[21087.920208] Call Trace:
[21087.920209]  <IRQ>
[21087.920213]  dump_stack+0x63/0x8b
[21087.920217]  nmi_cpu_backtrace+0xca/0xd0
[21087.920222]  ? lapic_can_unplug_cpu+0xb0/0xb0
[21087.920224]  nmi_trigger_cpumask_backtrace+0xe6/0x130
[21087.920227]  arch_trigger_cpumask_backtrace+0x19/0x20
[21087.920230]  rcu_dump_cpu_stacks+0xa3/0xd6
[21087.920233]  rcu_check_callbacks+0x7c4/0x920
[21087.920236]  ? tick_sched_do_timer+0x50/0x50
[21087.920240]  update_process_times+0x2f/0x60
[21087.920242]  tick_sched_handle+0x2a/0x60
[21087.920245]  tick_sched_timer+0x39/0x80
[21087.920247]  __hrtimer_run_queues+0xe7/0x220
[21087.920249]  hrtimer_interrupt+0xa3/0x1e0
[21087.920252]  smp_apic_timer_interrupt+0x6f/0x130
[21087.920254]  apic_timer_interrupt+0x84/0x90
[21087.920256]  </IRQ>
[21087.920258] RIP: 0010:smp_call_function_many+0x1f9/0x260
[21087.920260] RSP: 0000:ffffa1a10b3bba58 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
[21087.920262] RAX: 0000000000000009 RBX: ffff8bbd1bca3900 RCX: ffff8bc61b2e8e00
[21087.920263] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8bbd1b444c28
[21087.920265] RBP: ffffa1a10b3bba90 R08: fffffffffffffe00 R09: 0000000000feffff
[21087.920266] R10: ffffedf742c3af80 R11: ffff8bbb84d74ff4 R12: ffffffffb767b810
[21087.920267] R13: ffffa1a10b3bbac8 R14: 0000000000000001 R15: 0000000000000018
[21087.920270]  ? x86_configure_nx+0x50/0x50
[21087.920274]  ? smp_call_function_many+0x1d5/0x260
[21087.920276]  ? x86_configure_nx+0x50/0x50
[21087.920278]  on_each_cpu+0x2d/0x60
[21087.920281]  flush_tlb_kernel_range+0x79/0x80
[21087.920283]  unmap_kernel_range+0x26/0x30
[21087.920286]  zs_unmap_object+0x9d/0xd0
[21087.920290]  zram_bvec_rw.constprop.32+0x4f0/0x930 [zram]
[21087.920294]  zram_rw_page+0x9e/0xf3 [zram]
[21087.920296]  bdev_read_page+0x8f/0xc0
[21087.920298]  swap_readpage+0x54/0x1e0
[21087.920301]  swapin_readahead+0x198/0x2b0
[21087.920304]  do_swap_page+0x4f9/0x960
[21087.920306]  ? do_swap_page+0x4f9/0x960
[21087.920308]  ? do_wp_page+0x10b/0x4f0
[21087.920310]  __handle_mm_fault+0x847/0x11c0
[21087.920313]  handle_mm_fault+0xce/0x1b0
[21087.920316]  __do_page_fault+0x25e/0x500
[21087.920319]  do_page_fault+0x2e/0xe0
[21087.920322]  ? page_fault+0x2f/0x50
[21087.920324]  page_fault+0x45/0x50
[21087.920326] RIP: 0033:0x7f1560821bca
[21087.920327] RSP: 002b:00007ffc0e476230 EFLAGS: 00010206
[21087.920329] RAX: 00007f15610f1030 RBX: 00007f15610f50a0 RCX: 0000000000000000
[21087.920330] RDX: 00007f15610bd028 RSI: 0000000000000000 RDI: 0000000000000001
[21087.920332] RBP: 00007f1560f62028 R08: 00007f15610f50a8 R09: 0000000000000000
[21087.920333] R10: 00007f1561063000 R11: 0000000000000000 R12: 0000564ae84540b3
[21087.920335] R13: 00007f15610ef028 R14: 0000000000000000 R15: 00000000fffffffe
[21087.920342] Sending NMI from CPU 11 to CPUs 16:
[21087.920347] NMI backtrace for cpu 16
[21087.920349] CPU: 16 PID: 809401 Comm: apache2 Tainted: P          IO     4.15.18-11-pve #1
[21087.920350] Hardware name: HP ProLiant DL160 G6  , BIOS O33 07/01/2013
[21087.920352] RIP: 0010:cpumask_next_wrap+0x47/0x70
[21087.920353] RSP: 0000:ffff8bbd1bc83b88 EFLAGS: 00000016
[21087.920355] RAX: 000000000000000e RBX: 0000000000000001 RCX: 000000000000000e
[21087.920356] RDX: 000000000000000e RSI: 0000000000000000 RDI: ffff8bbd17812b38
[21087.920358] RBP: ffff8bbd1bc83bb0 R08: ffffffffffffc000 R09: 000000000003f03f
[21087.920359] R10: ffff8bbd1bc83b90 R11: 0000000000000092 R12: 0000000000000004
[21087.920360] R13: ffff8bbd17812b38 R14: 00000000ffffffff R15: 000000000000000d
[21087.920361] FS:  00007f15610e4040(0000) GS:ffff8bbd1bc80000(0000) knlGS:0000000000000000
[21087.920363] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21087.920364] CR2: 00007f15610bd040 CR3: 0000000740792002 CR4: 00000000000206e0
[21087.920364] Call Trace:
[21087.920365]  <IRQ>
[21087.920369]  select_idle_sibling+0x35c/0x420
[21087.920372]  select_task_rq_fair+0x8c5/0xb20
[21087.920374]  ? __module_text_address+0x12/0x60
[21087.920378]  ? kernel_text_address+0xdc/0xf0
[21087.920380]  ? page_fault+0x45/0x50
[21087.920382]  ? __kernel_text_address+0x12/0x40
[21087.920384]  ? page_fault+0x45/0x50
[21087.920388]  try_to_wake_up+0x10c/0x4c0
[21087.920391]  default_wake_function+0x12/0x20
[21087.920394]  autoremove_wake_function+0x12/0x40
[21087.920396]  __wake_up_common+0x8d/0x140
[21087.920398]  __wake_up_common_lock+0x80/0xc0
[21087.920401]  __wake_up+0x13/0x20
[21087.920403]  wake_up_klogd_work_func+0x40/0x60
[21087.920406]  irq_work_run_list+0x55/0x80
[21087.920408]  ? tick_sched_do_timer+0x50/0x50
[21087.920409]  irq_work_tick+0x40/0x50
[21087.920412]  update_process_times+0x42/0x60
[21087.920414]  tick_sched_handle+0x2a/0x60
[21087.920416]  tick_sched_timer+0x39/0x80
[21087.920418]  __hrtimer_run_queues+0xe7/0x220
[21087.920420]  hrtimer_interrupt+0xa3/0x1e0
[21087.920422]  smp_apic_timer_interrupt+0x6f/0x130
[21087.920424]  apic_timer_interrupt+0x84/0x90
[21087.920425]  </IRQ>
[21087.920428] RIP: 0010:smp_call_function_many+0x1f9/0x260
[21087.920429] RSP: 0000:ffffa1a10b3bba58 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
[21087.920431] RAX: 0000000000000009 RBX: ffff8bbd1bca3900 RCX: ffff8bc61b2e8e00
[21087.920432] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8bbd1b444c28
[21087.920433] RBP: ffffa1a10b3bba90 R08: fffffffffffffe00 R09: 0000000000feffff
[21087.920434] R10: ffffedf742c3af80 R11: ffff8bbb84d74ff4 R12: ffffffffb767b810
[21087.920435] R13: ffffa1a10b3bbac8 R14: 0000000000000001 R15: 0000000000000018
[21087.920438]  ? x86_configure_nx+0x50/0x50
[21087.920441]  ? smp_call_function_many+0x1d5/0x260
[21087.920442]  ? x86_configure_nx+0x50/0x50
[21087.920445]  on_each_cpu+0x2d/0x60
[21087.920447]  flush_tlb_kernel_range+0x79/0x80
[21087.920449]  unmap_kernel_range+0x26/0x30
[21087.920452]  zs_unmap_object+0x9d/0xd0
[21087.920455]  zram_bvec_rw.constprop.32+0x4f0/0x930 [zram]
[21087.920459]  zram_rw_page+0x9e/0xf3 [zram]
[21087.920461]  bdev_read_page+0x8f/0xc0
[21087.920464]  swap_readpage+0x54/0x1e0
[21087.920466]  swapin_readahead+0x198/0x2b0
[21087.920469]  do_swap_page+0x4f9/0x960
[21087.920470]  ? do_swap_page+0x4f9/0x960
[21087.920472]  ? do_wp_page+0x10b/0x4f0
[21087.920474]  __handle_mm_fault+0x847/0x11c0
[21087.920477]  handle_mm_fault+0xce/0x1b0
[21087.920479]  __do_page_fault+0x25e/0x500
[21087.920482]  do_page_fault+0x2e/0xe0
[21087.920484]  ? page_fault+0x2f/0x50
[21087.920486]  page_fault+0x45/0x50
[21087.920488] RIP: 0033:0x7f1560821bca
[21087.920489] RSP: 002b:00007ffc0e476230 EFLAGS: 00010206
[21087.920491] RAX: 00007f15610f1030 RBX: 00007f15610f50a0 RCX: 0000000000000000
[21087.920492] RDX: 00007f15610bd028 RSI: 0000000000000000 RDI: 0000000000000001
[21087.920493] RBP: 00007f1560f62028 R08: 00007f15610f50a8 R09: 0000000000000000
[21087.920494] R10: 00007f1561063000 R11: 0000000000000000 R12: 0000564ae84540b3
[21087.920495] R13: 00007f15610ef028 R14: 0000000000000000 R15: 00000000fffffffe
[21087.920497] Code: cb 00 53 41 89 d4 89 cb eb 12 8b 35 78 81 cb 00 45 89 f7 bb 01 00 00 00 39 c6 77 26 41 8d 57 01 4c 89 ef 48 63 d2 e8 29 4c af ff <45> 39 e7 89 c1 0f 9c c2 84 da 74 d3 41 39 c4 7f ce 8b 0d 46 81 
behlendorf commented 5 years ago

@Harvie thanks for posting the additional console logs. Were you able to log in to the machine after hitting the problem this time with the 0.8 version of ZFS? If so, are these the complete logs starting from the initial failure. I ask because nothing in the most recent logs you posted is related to ZFS. They do indicate there that may be a problem with the kernel provided zram driver.

Harvie commented 5 years ago

OMG. You are right. I forgot to disable zram some time ago. Thanks for pointing this out. I was convinced that "zram" is part of zfs infrastructure (because of "z" leter :-) and didn't realized it. However i've experienced crashes on machines not running zram. But this might be big part of the problem. I will disable zram and do some testing...

Harvie commented 5 years ago

Crashed again (0.8.0-rc3_126_g5dbf8b4ed)

image

I don't think this is caused by jumbo frame problem, since i am using zfs send/recv over plain old SSH. And i've never had ssh corupted my data.

What really drives me nuts is the fact, the machine does not reboot after freezing. I don't know why. I have it set to panic on hung task and reboot on panic, but it still does not reboot:

kernel.panic = 5
kernel.hung_task_panic = 1
kernel.hung_task_timeout_secs = 300
kernel.watchdog_thresh = 60
kernel.softlockup_panic = 1
kernel.hardlockup_panic = 1
#kernel.panic_on_warn = 0
#kernel.panic_on_oops = 0

I've just returned from datacenter and i've found it have crashed again...

Harvie commented 5 years ago

because you didn't paste logs from any panic, it is just a blocked task. not much we can do without any logs..

Well the photo is all i have. Because the machine stops responding completely. No keyboard, no networking... I would happily provide you with some logs, but i am not very experienced with retrieving logs from crashed systems. So for now the photos are all i can get. Is there some easy way to get the logs?

What will happen if i set kernel.hung_task_timeout_secs = 0 ? I guess the hung ZFS recv tasks will accumulate until something goes wrong. Also i will probably lose ability to autoreboot in case of some none-zfs task is stucked...

behlendorf commented 5 years ago

@Harvie thanks for the updated stack trace, this one definitely shows a panic in the zfs send/recv code which needs to be investigated. You mentioned this was with 0.8.0-rc3_126_g5dbf8b4ed which is good to know, also when building did you by chance enable debugging by passing --enable-debug to configure. If not, that would be the next step since it will enable additional correctness checks which may detect the core issue.

You can force the system to panic, rather then hang this one thread, by setting the spl.ko module option spl_panic_halt=1. The default behavior is to suspend the one thread since this normally will allow you to log in and inspect the system, though that seems not to be the case on your system.

Harvie commented 5 years ago

Had two more crashes: image image

I've found that it does not even allow me to use sysrq when it freezes. However i managed to get dmesg before it freezed completely:

[    8.623049] ZFS: Loaded module v0.8.0-rc3_126_g5dbf8b4ed, ZFS pool version 5000, ZFS filesystem version 5
[    8.647634] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    9.151052] random: crng init done
[    9.154558] random: 7 urandom warning(s) missed due to ratelimiting
[   26.683471]  zd16: p1 p2
[   33.544883] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   33.544888] 8021q: adding VLAN 0 to HW filter on device eth0
[   33.937170] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   34.265830] Bluetooth: Core ver 2.22
[   34.265861] NET: Registered protocol family 31
[   34.265861] Bluetooth: HCI device and connection manager initialized
[   34.265869] Bluetooth: HCI socket layer initialized
[   34.265873] Bluetooth: L2CAP socket layer initialized
[   34.265884] Bluetooth: SCO socket layer initialized
[   34.269490] Netfilter messages via NETLINK v0.30.
[   36.396990] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   36.422043] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   54.654126] ip_set: protocol 6
[  688.535774] perf: interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 1076.082743] perf: interrupt took too long (3150 > 3141), lowering kernel.perf_event_max_sample_rate to 63500
[ 1096.157111] VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95)
[ 1096.172219] PANIC at dnode.c:715:dnode_reallocate()
[ 1096.180809] Showing stack for process 803
[ 1096.188362] CPU: 3 PID: 803 Comm: receive_writer Tainted: P           O     4.15.18-12-pve #1
[ 1096.188362] Hardware name:  /S3000AH, BIOS S3000.86B.02.00.0044.071120071047 07/11/2007
[ 1096.188364] Call Trace:
[ 1096.188377]  dump_stack+0x63/0x8b
[ 1096.188402]  spl_dumpstack+0x29/0x30 [spl]
[ 1096.188407]  spl_panic+0xc8/0x110 [spl]
[ 1096.188410]  ? mutex_lock+0x12/0x40
[ 1096.188412]  ? _cond_resched+0x1a/0x50
[ 1096.188621]  ? zfs_space_delta_cb+0xeb/0x230 [zfs]
[ 1096.188624]  ? __kmalloc_node+0x206/0x2b0
[ 1096.188626]  ? _cond_resched+0x1a/0x50
[ 1096.188627]  ? mutex_lock+0x12/0x40
[ 1096.188665]  ? dmu_objset_userquota_get_ids+0x26a/0x420 [zfs]
[ 1096.188667]  ? _cond_resched+0x1a/0x50
[ 1096.188668]  ? _cond_resched+0x1a/0x50
[ 1096.188669]  ? mutex_lock+0x12/0x40
[ 1096.188710]  ? dnode_set_blksz+0xb7/0x310 [zfs]
[ 1096.188714]  ? spl_kmem_free+0x33/0x40 [spl]
[ 1096.188755]  dnode_reallocate+0x24c/0x270 [zfs]
[ 1096.188793]  dmu_object_reclaim_dnsize+0xca/0xf0 [zfs]
[ 1096.188831]  receive_object+0x450/0x9f0 [zfs]
[ 1096.188836]  ? __wake_up_common_lock+0x8e/0xc0
[ 1096.188838]  ? __wake_up+0x13/0x20
[ 1096.188842]  ? __cv_signal+0x2d/0x40 [spl]
[ 1096.188881]  receive_writer_thread+0x420/0xaa0 [zfs]
[ 1096.188884]  ? set_curr_task_fair+0x2b/0x60
[ 1096.188889]  ? spl_kmem_free+0x33/0x40 [spl]
[ 1096.188928]  ? receive_free.isra.9+0xc0/0xc0 [zfs]
[ 1096.188934]  thread_generic_wrapper+0x74/0x90 [spl]
[ 1096.188973]  ? receive_free.isra.9+0xc0/0xc0 [zfs]
[ 1096.188978]  ? thread_generic_wrapper+0x74/0x90 [spl]
[ 1096.188982]  kthread+0x105/0x140
[ 1096.188988]  ? __thread_exit+0x20/0x20 [spl]
[ 1096.188989]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1096.188992]  ret_from_fork+0x35/0x40
[ 1278.557483] perf: interrupt took too long (3945 > 3937), lowering kernel.perf_event_max_sample_rate to 50500
Harvie commented 5 years ago

did you by chance enable debugging by passing --enable-debug to configure.

Unfortunately not, but these two lines seem to be quite relevant:

[ 1096.157111] VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95)
[ 1096.172219] PANIC at dnode.c:715:dnode_reallocate()

ps aux shows zfs recv process in Ds state (uninterruptible sleep (usually IO))

I think that in all such cases the zfs module should abort the operation and zfs recv should return non-zero exit code.

Harvie commented 5 years ago

I have bit more insight in what happens. I've set kernel.hung_task_timeout_secs = 0 so it didn't rebooted after the task is hung. But then when i later tried to reboot the machine manualy, it has freezed completely instead of rebooting. So there's some secondary bug causing kernel to freeze completely if i try to reboot after this happens. Unfortunately this affects even automatic panic reboot provided by combination of kernel.hung_task_panic, kernel.hung_task_timeout_secs and kernel.panic. Which means that machine will not recover without manual intervention.

Harvie commented 5 years ago

When it fails to reboot itself, it repeatedly writes the following into the dmesg instead of rebooting:

...
[42587.314026] systemd-journald[282]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[42707.315498] systemd-journald[282]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[42767.316223] systemd-journald[282]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[42887.317665] systemd-journald[282]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[42947.318420] systemd-journald[282]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[43067.319910] systemd-journald[282]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected

spl_panic_halt=1 seems to at least cause the machine to reboot on crash. That seems to be a good idea on production servers...

When i enable the panic halt and keep dmesg -w running over ssh it at least manages to give me this before the server reboots:

[ 4778.548762] VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95)
[ 4778.565400] PANIC at dnode.c:715:dnode_reallocate()
[ 4778.574797] Kernel panic - not syncing: VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95)
Harvie commented 5 years ago

I've noticed this problem only happens when replicating one of my 46 datasets. In middle of the replication. It's always the same dataset. Unfortunately i can't share it, since it contains customer's data. I've tried to run scrub in order to fix it, but it didn't helped...

I guess it might be possible to fix the issue by recreating the dataset on the sender side and rsyncing data to new one. But that way we'll not be able to debug the receiver issue.

Harvie commented 5 years ago

When i think about it... Dataset which is causing the crash is different from others, because i've been recently increasing quota on it (after upgrading to 0.8.0), while on others i did not changed anything since upgrade...

tcaputi commented 5 years ago

Are you able to get any output after the crash? If so you can run this command before doing the replication:

echo 4294966270 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
echo 4294966270 > /sys/module/zfs/parameters/zfs_flags

and then after the panic run:

cat /proc/spl/kstat/zfs/dbgmsg
wolfgangwuerfl commented 5 years ago

I have exactly the same problem under gentoo with latest ZOL git Version. "dnode_set_blksz" (dnode.c Line 715) the assert triggers a panic. I get this when doing a recursive zfs send on the pool - ALWAYS on the same dataset. Scrub doesnt show any errors. As the code says, this is only successful if the dnode doesnt have blocks allocated or dirty blocks beyond the first block. But the error handling is not very graceful.

Harvie commented 5 years ago
echo 0 > /sys/module/spl/parameters/spl_panic_halt
echo 4294966270 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
echo 4294966270 > /sys/module/zfs/parameters/zfs_flags

These are just some of the errors in dbg. However it's not during the dataset that crashes.

1554246004   zap_micro.c:985:zap_lookup_impl(): error 2
1554246004   dsl_prop.c:147:dsl_prop_get_dd(): error 2
1554246004   zap_micro.c:985:zap_lookup_impl(): error 2
1554246004   zap_micro.c:985:zap_lookup_impl(): error 2
1554246004   zap_micro.c:985:zap_lookup_impl(): error 2
1554246004   dsl_prop.c:147:dsl_prop_get_dd(): error 2
1554246004   spa_history.c:314:spa_history_log_sync(): txg 297878 receive tank/backup/pve1/%recv (id 266085) 
1554246004   zap.c:740:fzap_checksize(): error 22
1554246004   zfeature.c:239:feature_get_refcount(): error 95
1554246004   zfeature.c:239:feature_get_refcount(): error 95
1554246068   zap_micro.c:988:zap_lookup_impl(): error 75
1554246159   dmu.c:453:dmu_spill_hold_existing(): error 2
1554246159   sa.c:370:sa_attr_op(): error 2
1554246159   dbuf.c:2702:dbuf_findbp(): error 2
1554246159   dnode.c:1423:dnode_hold_impl(): error 2
1554246159   dbuf.c:3159:dbuf_hold_impl_arg(): error 2

I've randomly picked lines from the log, so they don't repeat the same errors. I can't do proper debug of the crashed dataset right now...

tcaputi commented 5 years ago

@harvie can you repeat the test she share the entire file after the crash? Most errors reported via this mechanism are normal and expected, by the way

Harvie commented 5 years ago

@tcaputi These are the lines that showed up moments before and during crash: It crashes on tank/backup/pve1/vps/subvol-212-disk-1 (Full log here: dbgmsg.txt ): Also note there was scrub running at the time (but the crash is reproducible whether it runs or not)...

1554285854   spa_history.c:314:spa_history_log_sync(): txg 315926 destroy tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332567) 
1554285854   bptree.c:232:bptree_iterate(): bptree index 0: traversing from min_txg=315916 bookmark 0/0/0/0
1554285854   dsl_scan.c:3281:dsl_process_async_destroys(): freed 12 blocks in 1ms from free_bpobj/bptree txg 315926; err=0
1554285855   dsl_scan.c:3612:dsl_scan_sync(): scan issued 740 blocks (1 segs) in 1041ms (avg_block_size = 126899, avg_seg_size = 95583744)
1554285857   dsl_scan.c:3612:dsl_scan_sync(): scan issued 754 blocks (1 segs) in 1044ms (avg_block_size = 123324, avg_seg_size = 93637120)
1554285858   spa_history.c:314:spa_history_log_sync(): txg 315928 receive tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332435) 
1554285859   dsl_scan.c:3612:dsl_scan_sync(): scan issued 761 blocks (1 segs) in 1046ms (avg_block_size = 129905, avg_seg_size = 102903296)
1554285859   metaslab.c:2642:metaslab_condense(): condensing: txg 315928, msp[83] 000000007099e18c, vdev id 0, spa tank, smp size 20312, segments 758, forcing condense=FALSE
1554285861   dsl_scan.c:3612:dsl_scan_sync(): scan issued 6105 blocks (1 segs) in 1034ms (avg_block_size = 14670, avg_seg_size = 90400256)
1554285863   dsl_scan.c:3612:dsl_scan_sync(): scan issued 757 blocks (1 segs) in 1040ms (avg_block_size = 120745, avg_seg_size = 91611136)
1554285864   dsl_scan.c:3612:dsl_scan_sync(): scan issued 767 blocks (1 segs) in 1044ms (avg_block_size = 130029, avg_seg_size = 103000064)
1554285866   dsl_scan.c:3612:dsl_scan_sync(): scan issued 4145 blocks (1 segs) in 1040ms (avg_block_size = 24107, avg_seg_size = 100842496)
1554285867   dsl_scan.c:3612:dsl_scan_sync(): scan issued 810 blocks (1 segs) in 1044ms (avg_block_size = 123954, avg_seg_size = 101988864)
1554285869   dsl_scan.c:3612:dsl_scan_sync(): scan issued 777 blocks (1 segs) in 1044ms (avg_block_size = 122507, avg_seg_size = 97101312)
1554285869   metaslab.c:2642:metaslab_condense(): condensing: txg 315934, msp[84] 000000006602e64c, vdev id 0, spa tank, smp size 24960, segments 968, forcing condense=FALSE
1554285870   dsl_scan.c:3612:dsl_scan_sync(): scan issued 776 blocks (1 segs) in 1042ms (avg_block_size = 124644, avg_seg_size = 98706944)
1554285871   spa_history.c:314:spa_history_log_sync(): txg 315936 finish receiving tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332435) snap=2019-04-03-114500
1554285871   spa_history.c:314:spa_history_log_sync(): txg 315936 clone swap tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332435) parent=subvol-210-disk-1
1554285871   spa_history.c:314:spa_history_log_sync(): txg 315936 snapshot tank/backup/pve1/vps/subvol-210-disk-1@2019-04-03-114500 (id 332571) 
1554285871   spa_history.c:314:spa_history_log_sync(): txg 315936 destroy tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332435) 
1554285871   bptree.c:232:bptree_iterate(): bptree index 0: traversing from min_txg=315926 bookmark 0/0/0/0
1554285871   dsl_scan.c:3281:dsl_process_async_destroys(): freed 12 blocks in 1ms from free_bpobj/bptree txg 315936; err=0
1554285872   dsl_scan.c:3612:dsl_scan_sync(): scan issued 1695 blocks (1 segs) in 1038ms (avg_block_size = 58576, avg_seg_size = 101062656)
1554285874   dsl_scan.c:3612:dsl_scan_sync(): scan issued 797 blocks (1 segs) in 1034ms (avg_block_size = 121509, avg_seg_size = 99873792)
1554285874   spa_history.c:314:spa_history_log_sync(): txg 315938 receive tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332579) 
1554285875   dsl_scan.c:3612:dsl_scan_sync(): scan issued 760 blocks (1 segs) in 1035ms (avg_block_size = 124114, avg_seg_size = 98268672)
1554285877   dsl_scan.c:3612:dsl_scan_sync(): scan issued 553 blocks (1 segs) in 1036ms (avg_block_size = 127394, avg_seg_size = 71434240)
1554285879   dsl_scan.c:3612:dsl_scan_sync(): scan issued 804 blocks (1 segs) in 1042ms (avg_block_size = 123357, avg_seg_size = 101583360)
1554285880   dsl_scan.c:3612:dsl_scan_sync(): scan issued 1628 blocks (1 segs) in 1036ms (avg_block_size = 58890, avg_seg_size = 98772992)
1554285881   dsl_scan.c:3612:dsl_scan_sync(): scan issued 741 blocks (1 segs) in 1039ms (avg_block_size = 125372, avg_seg_size = 95204864)
1554285883   dsl_scan.c:3612:dsl_scan_sync(): scan issued 787 blocks (1 segs) in 1043ms (avg_block_size = 126392, avg_seg_size = 100123136)
1554285885   dsl_scan.c:3612:dsl_scan_sync(): scan issued 1296 blocks (1 segs) in 1039ms (avg_block_size = 73784, avg_seg_size = 98725888)
1554285886   dsl_scan.c:3612:dsl_scan_sync(): scan issued 779 blocks (1 segs) in 1042ms (avg_block_size = 124187, avg_seg_size = 98285568)
1554285887   spa_history.c:314:spa_history_log_sync(): txg 315946 finish receiving tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332579) snap=2019-04-03-120000
1554285887   spa_history.c:314:spa_history_log_sync(): txg 315946 clone swap tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332579) parent=subvol-210-disk-1
1554285887   spa_history.c:314:spa_history_log_sync(): txg 315946 snapshot tank/backup/pve1/vps/subvol-210-disk-1@2019-04-03-120000 (id 331640) 
1554285887   spa_history.c:314:spa_history_log_sync(): txg 315946 destroy tank/backup/pve1/vps/subvol-210-disk-1/%recv (id 332579) 
1554285887   bptree.c:232:bptree_iterate(): bptree index 0: traversing from min_txg=315936 bookmark 0/0/0/0
1554285887   dsl_scan.c:3281:dsl_process_async_destroys(): freed 12 blocks in 1ms from free_bpobj/bptree txg 315946; err=0
1554285888   dsl_scan.c:3612:dsl_scan_sync(): scan issued 722 blocks (1 segs) in 1040ms (avg_block_size = 127121, avg_seg_size = 92277760)
1554285894   dsl_scan.c:3612:dsl_scan_sync(): scan issued 2033 blocks (1 segs) in 4501ms (avg_block_size = 118550, avg_seg_size = 242679808)
1554285894   spa_history.c:306:spa_history_log_sync(): command: zfs recv -F tank/backup/pve1/vps/subvol-210-disk-1
1554285894   spa_history.c:314:spa_history_log_sync(): txg 315948 receive tank/backup/pve1/vps/subvol-212-disk-1/%recv (id 331646) 
1554285895   dsl_scan.c:3612:dsl_scan_sync(): scan issued 1756 blocks (1 segs) in 1041ms (avg_block_size = 41981, avg_seg_size = 76745216)
1554285901   dsl_scan.c:3612:dsl_scan_sync(): scan issued 1830 blocks (1 segs) in 5133ms (avg_block_size = 126237, avg_seg_size = 233222656)
1554285905   dsl_scan.c:3612:dsl_scan_sync(): scan issued 662 blocks (1 segs) in 2456ms (avg_block_size = 124505, avg_seg_size = 86342656)
1554285905   metaslab.c:2642:metaslab_condense(): condensing: txg 315950, msp[83] 000000007099e18c, vdev id 0, spa tank, smp size 20432, segments 755, forcing condense=FALSE
1554285911   dsl_scan.c:3612:dsl_scan_sync(): scan issued 298 blocks (1 segs) in 1134ms (avg_block_size = 111342, avg_seg_size = 36106752)
1554285920   dsl_scan.c:3612:dsl_scan_sync(): scan issued 254 blocks (1 segs) in 1114ms (avg_block_size = 122918, avg_seg_size = 32504832)
1554285926   dsl_scan.c:3612:dsl_scan_sync(): scan issued 1646 blocks (1 segs) in 2379ms (avg_block_size = 122091, avg_seg_size = 201345536)

This is the dmesg:

[ 3899.363064] VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95)
[ 3899.381400] PANIC at dnode.c:715:dnode_reallocate()
[ 3899.397196] Showing stack for process 19347
[ 3899.413695] CPU: 3 PID: 19347 Comm: receive_writer Tainted: P           O     4.15.18-12-pve #1
[ 3899.434953] Hardware name:  /S3000AH, BIOS S3000.86B.02.00.0044.071120071047 07/11/2007
[ 3899.454548] Call Trace:
[ 3899.468022]  dump_stack+0x63/0x8b
[ 3899.482296]  spl_dumpstack+0x29/0x30 [spl]
[ 3899.497294]  spl_panic+0xc8/0x110 [spl]
[ 3899.511958]  ? mutex_lock+0x12/0x40
[ 3899.526180]  ? _cond_resched+0x1a/0x50
[ 3899.540715]  ? zfs_space_delta_cb+0xeb/0x230 [zfs]
[ 3899.556166]  ? __kmalloc_node+0x206/0x2b0
[ 3899.570767]  ? _cond_resched+0x1a/0x50
[ 3899.585029]  ? mutex_lock+0x12/0x40
[ 3899.599023]  ? dmu_objset_userquota_get_ids+0x26a/0x420 [zfs]
[ 3899.615262]  ? _cond_resched+0x1a/0x50
[ 3899.629446]  ? _cond_resched+0x1a/0x50
[ 3899.643495]  ? mutex_lock+0x12/0x40
[ 3899.657254]  ? dnode_set_blksz+0xb7/0x310 [zfs]
[ 3899.671998]  ? spl_kmem_free+0x33/0x40 [spl]
[ 3899.686465]  dnode_reallocate+0x24c/0x270 [zfs]
[ 3899.701138]  dmu_object_reclaim_dnsize+0xca/0xf0 [zfs]
[ 3899.716418]  receive_object+0x450/0x9f0 [zfs]
[ 3899.730826]  ? __wake_up_common_lock+0x8e/0xc0
[ 3899.745274]  ? __wake_up+0x13/0x20
[ 3899.758576]  ? __cv_signal+0x2d/0x40 [spl]
[ 3899.772573]  receive_writer_thread+0x420/0xaa0 [zfs]
[ 3899.787419]  ? set_curr_task_fair+0x2b/0x60
[ 3899.801453]  ? set_user_nice.part.68+0x103/0x200
[ 3899.815944]  ? receive_free.isra.9+0xc0/0xc0 [zfs]
[ 3899.830513]  thread_generic_wrapper+0x74/0x90 [spl]
[ 3899.845151]  ? receive_free.isra.9+0xc0/0xc0 [zfs]
[ 3899.859635]  ? thread_generic_wrapper+0x74/0x90 [spl]
[ 3899.874374]  kthread+0x105/0x140
[ 3899.887211]  ? __thread_exit+0x20/0x20 [spl]
[ 3899.901059]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 3899.915654]  ret_from_fork+0x35/0x40
edillmann commented 5 years ago

I got the same with latest git

[48816.810205] VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95) [48816.810212] PANIC at dnode.c:715:dnode_reallocate() [48816.810214] Showing stack for process 100084 [48816.810218] CPU: 2 PID: 100084 Comm: receive_writer Tainted: P O 4.15.18-12-pve #1 [48816.810221] Hardware name: GIGABYTE GB-BXi5-5200/MQLP5AP-00, BIOS F5 11/23/2015 [48816.810224] Call Trace: [48816.810231] dump_stack+0x63/0x8b [48816.810239] spl_dumpstack+0x29/0x30 [spl] [48816.810244] spl_panic+0xc8/0x110 [spl] [48816.810248] ? mutex_lock+0x12/0x40 [48816.810251] ? _cond_resched+0x1a/0x50 [48816.810309] ? zfs_space_delta_cb+0xeb/0x230 [zfs] [48816.810316] ? kmalloc_node+0x1f4/0x2b0 [48816.810321] ? _cond_resched+0x1a/0x50 [48816.810326] ? mutex_lock+0x12/0x40 [48816.810371] ? dmu_objset_userquota_get_ids+0x26a/0x420 [zfs] [48816.810375] ? _cond_resched+0x1a/0x50 [48816.810377] ? _cond_resched+0x1a/0x50 [48816.810380] ? mutex_lock+0x12/0x40 [48816.810426] ? dnode_set_blksz+0xb7/0x310 [zfs] [48816.810433] ? spl_kmem_free+0x33/0x40 [spl] [48816.810468] dnode_reallocate+0x24c/0x270 [zfs] [48816.810501] dmu_object_reclaim_dnsize+0xca/0xf0 [zfs] [48816.810533] receive_object+0x450/0x9f0 [zfs] [48816.810538] ? wake_up_common_lock+0x8e/0xc0 [48816.810541] ? __wake_up+0x13/0x20 [48816.810547] ? cv_signal+0x2d/0x40 [spl] [48816.810578] receive_writer_thread+0x420/0xaa0 [zfs] [48816.810582] ? set_curr_task_fair+0x2b/0x60 [48816.810587] ? set_user_nice.part.68+0x103/0x200 [48816.810618] ? receive_free.isra.9+0xc0/0xc0 [zfs] [48816.810625] thread_generic_wrapper+0x74/0x90 [spl] [48816.810655] ? receive_free.isra.9+0xc0/0xc0 [zfs] [48816.810662] ? thread_generic_wrapper+0x74/0x90 [spl] [48816.810666] kthread+0x105/0x140 [48816.810671] ? thread_exit+0x20/0x20 [spl] [48816.810675] ? kthread_create_worker_on_cpu+0x70/0x70 [48816.810679] ret_from_fork+0x35/0x40

tcaputi commented 5 years ago

@Harvie unfortunately, it looks like the log you provided was not taken after running echo 4294966270 > /sys/module/zfs/parameters/zfs_flags. That command enables printing error messages to dbgmsg (like the ones from the snippet you originally posted), and the output you gave doesn't have any of those statements. Could you try it one more time?

Harvie commented 5 years ago

it looks like the log you provided was not taken after running echo 4294966270 > /sys/module/zfs/parameters/zfs_flags

I am pretty sure, i did that. Btw enable can only be set to 0 and 1:

# cat /sys/module/zfs/parameters/zfs_flags
4294966270
# cat /sys/module/zfs/parameters/zfs_dbgmsg_enable
1

I can try one more time...

Harvie commented 5 years ago
[ 1328.892247] VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95)
[ 1328.910565] PANIC at dnode.c:715:dnode_reallocate()
[ 1328.926314] Showing stack for process 13233
[ 1328.941377] CPU: 3 PID: 13233 Comm: receive_writer Tainted: P           O     4.15.18-12-pve #1
[ 1328.961142] Hardware name:  /S3000AH, BIOS S3000.86B.02.00.0044.071120071047 07/11/2007
[ 1328.980333] Call Trace:
[ 1328.993919]  dump_stack+0x63/0x8b
[ 1329.008327]  spl_dumpstack+0x29/0x30 [spl]
[ 1329.023464]  spl_panic+0xc8/0x110 [spl]
[ 1329.038269]  ? spl_kmem_free+0x33/0x40 [spl]
[ 1329.053549]  ? __dprintf+0xf6/0x160 [zfs]
[ 1329.068405]  ? _cond_resched+0x1a/0x50
[ 1329.082925]  ? mutex_lock+0x12/0x40
[ 1329.097152]  ? __set_error+0x2c/0x30 [zfs]
[ 1329.111926]  ? dnode_set_blksz+0xb7/0x310 [zfs]
[ 1329.127076]  ? spl_kmem_free+0x33/0x40 [spl]
[ 1329.141928]  dnode_reallocate+0x24c/0x270 [zfs]
[ 1329.156992]  dmu_object_reclaim_dnsize+0xca/0xf0 [zfs]
[ 1329.172653]  receive_object+0x450/0x9f0 [zfs]
[ 1329.187444]  ? __wake_up_common_lock+0x8e/0xc0
[ 1329.202269]  ? __wake_up+0x13/0x20
[ 1329.215953]  ? __cv_signal+0x2d/0x40 [spl]
[ 1329.230335]  receive_writer_thread+0x420/0xaa0 [zfs]
[ 1329.245549]  ? set_curr_task_fair+0x2b/0x60
[ 1329.259945]  ? set_user_nice.part.68+0x103/0x200
[ 1329.274793]  ? receive_free.isra.9+0xc0/0xc0 [zfs]
[ 1329.289744]  thread_generic_wrapper+0x74/0x90 [spl]
[ 1329.304771]  ? receive_free.isra.9+0xc0/0xc0 [zfs]
[ 1329.319636]  ? thread_generic_wrapper+0x74/0x90 [spl]
[ 1329.334759]  kthread+0x105/0x140
[ 1329.347993]  ? __thread_exit+0x20/0x20 [spl]
[ 1329.362252]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1329.377258]  ret_from_fork+0x35/0x40

There were lots of reapeated lines, so i've used uniq -c to compress it. It removed the duplicates (only those immediately following each other and having identic timetamp). Also it added count of how many times it was repeated:

# cat /sys/module/zfs/parameters/zfs_flags
4294966270
# cat /sys/module/zfs/parameters/zfs_dbgmsg_enable
1
# cat /proc/spl/kstat/zfs/dbgmsg | uniq -c
      1 timestamp    message 
    108 1554375109   dbuf.c:2702:dbuf_findbp(): error 2
      1 1554375109   dnode.c:1423:dnode_hold_impl(): error 2
      1 1554375109   dbuf.c:3159:dbuf_hold_impl_arg(): error 2
      1 1554375109   dnode.c:2310:dnode_next_offset_level(): error 3
     21 1554375109   dbuf.c:2702:dbuf_findbp(): error 2
      1 1554375109   dnode.c:2343:dnode_next_offset_level(): error 3
      6 1554375109   dbuf.c:2702:dbuf_findbp(): error 2
    313 1554375109   dnode.c:1423:dnode_hold_impl(): error 2
      4 1554375109   dbuf.c:2702:dbuf_findbp(): error 2
      9 1554375109   dnode.c:1423:dnode_hold_impl(): error 2
     12 1554375109   dbuf.c:2702:dbuf_findbp(): error 2
     11 1554375109   dnode.c:1423:dnode_hold_impl(): error 2
     11 1554375109   dbuf.c:2702:dbuf_findbp(): error 2
      7 1554375109   dnode.c:1423:dnode_hold_impl(): error 2
      7 1554375109   dbuf.c:2702:dbuf_findbp(): error 2
      3 1554375109   dnode.c:1423:dnode_hold_impl(): error 2
      2 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
      2 1554375110   dbuf.c:3159:dbuf_hold_impl_arg(): error 2
      5 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     12 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     28 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     28 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
     14 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
     14 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
      2 1554375110   dnode.c:1423:dnode_hold_impl(): error 2
      4 1554375110   dbuf.c:2702:dbuf_findbp(): error 2
  29201 1554375112   zfeature.c:239:feature_get_refcount(): error 95
     11 1554375113   zfeature.c:239:feature_get_refcount(): error 95
      1 1554375113   zap.c:740:fzap_checksize(): error 22
    368 1554375113   zfeature.c:239:feature_get_refcount(): error 95
    165 1554375113   dnode.c:1423:dnode_hold_impl(): error 2
     17 1554375113   zfeature.c:239:feature_get_refcount(): error 95
    353 1554375113   dnode.c:1423:dnode_hold_impl(): error 2
    434 1554375114   zfeature.c:239:feature_get_refcount(): error 95
     14 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     14 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     34 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     16 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     20 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
      7 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
      7 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     39 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     21 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     25 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     28 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     28 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     22 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     22 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     32 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     32 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
     18 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     18 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
      8 1554375114   dnode.c:1423:dnode_hold_impl(): error 2
     28 1554375114   dbuf.c:2702:dbuf_findbp(): error 2
      1 1554375115   dnode.c:1762:dnode_set_blksz(): error 95
  15526 1554375117   zfeature.c:239:feature_get_refcount(): error 95
      1 1554375117   zap.c:740:fzap_checksize(): error 22
    287 1554375117   zfeature.c:239:feature_get_refcount(): error 95
codyps commented 5 years ago

I'm seeing the same VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) failed (0 == 95) failure (my duplicate ticket at #8574).

As a note: I did not experience this issue at all until I had a power loss on my system. Since that time, I've had a zfs recv cause this failure on each boot up (2 so far).

tcaputi commented 5 years ago

@jmesmon Thats probably just because you didnt have the new module version inserted (it was only installed). Rebooting would have inserted the installed module.

wolfgangwuerfl commented 5 years ago

I deleted some snapshots and now, the panic when send and receive the Pool in the assert from line 715 in dnode.c is gone. For me, it looks like that some faulty snapshots trigger the panic, but scrub hasnt detected any errors.

Harvie commented 5 years ago

I think this is combination of two issues:

1.) zfs 0.8.0 sometimes creates invalid snapshots 2.) both zfs 0.8.0 and 0.7.13 throw a panic when receiving such invalid snapshot that gets rejected by assertion. this is in my opinion even worse than 1.) because it unrecoverably crashes machine when it could have just reject the snapshot and return error code to userspace... i am afraid that this is bad by design and will affect more than just VERIFY3(0 == dnode_set_blksz(dn, blocksize, 0, tx)) assertion. Chances are that any failed assertion in recv will cause kernel panic rather than userspace error. Which is not good way to handle this.

behlendorf commented 5 years ago

@wolfgangwuerfl @Harvie I understand why it may seem that there's something wrong with the snapshots, but that's almost certainly not the case. It's likely that those snapshots happen to generate a send stream which requires an object to be reallocated in-place. This is fine but certain conditions are then required to be true, the VERIFY is detecting that something isn't as it should be and there is a flaw somewhere in the logic. This should be impossible under all circumstance so it's handled with a VERIFY instead of returning an error, which would hide the issue.

I've reverted the patch from master which introduced the issue until we can determine the fundamental root cause. I'll be working on this analysis. If you're in a position to do so, it would be great if you could pull the latest version (d93d4b1acdf53a25ad21e20ddfca3b0d58a06cdf or newer) and verify the issue is resolved.

Harvie commented 5 years ago

@behlendorf So should i apply the fix on the sending side or receiving side? i am bit confused about where the problem lays :) however i am compiling it on receiving side, which is the crashing machine. on the other hand this was crashing with zfs 0.7.13, which is probably older than the reverted commit...

Harvie commented 5 years ago

It does not build:

make[4]: *** No rule to make target '../../../include/spl/sys/dkioc_free_util.h', needed by 'all-am'. Stop.

jwittlincohen commented 5 years ago

It does not build:

make[4]: *** No rule to make target '../../../include/spl/sys/dkioc_free_util.h', needed by 'all-am'. Stop.

I was able to build the latest master without issue on Debian Stretch. Have you checked the Custom Package Instructions? See the Git Master Branch section at the bottom of the page to get started and ensure you've installed the build dependencies for your distribution. If you have issues, you can always ask on the #zfsonlinux IRC channel or on the mailing list.

Harvie commented 5 years ago

Yes, perhaps i've forgot to run ./autogen.sh... I have just the git checked out in /usr/src/zfs-git and then i use ./autogen.sh and dkms autoinstall zfs/git ... That's how i've built it previously...

BTW i've just rebooted to 0.7.13 and so far it seems to be syncing... unfortunately that might result in me being incapable to reproduce the bug, since it will get synced and i don't know which of the automatic snapshot it was... But it's strange... I had it crashing with 0.7.13 in the beginning. I wonder if it will sync...

update: the problematic dataset succesfully synced to 0.7.13 recv... Is there anyone else, who can try to reproduce the problem on latest git?

wolfgangwuerfl commented 5 years ago

Looks OK now: spa_history.c:319:spa_history_log_sync(): txg 1455054 import pool version 5000; software version zfs-0.8.0-rc3-141-gd93d4b1a; uts gentoo 4.20.13-gentoo-BeoCore #3 SMP PREEMPT Sat Apr 6 06:57:34 CEST 2019 x86_64

SEND/RECEIVE works flawlessly!

Doing a scrub, shows a MUCH more cleaner log if doing cat /proc/spl/kstat/zfs/dbgmsg like this

1554528759 metaslab.c:2652:metaslab_condense(): condensing: txg 120481, msp[33] ffff888425a02000, vdev id 0, spa fasttank, smp size 46616, segments 2902, forcing condense=FALSE 1554528759 zap.c:740:fzap_checksize(): error 22 1554528759 zap.c:740:fzap_checksize(): error 22 1554528759 zap.c:740:fzap_checksize(): error 22 1554528759 zap.c:740:fzap_checksize(): error 22 1554528759 zap.c:740:fzap_checksize(): error 22 1554528759 zap.c:740:fzap_checksize(): error 22 1554528759 zap.c:740:fzap_checksize(): error 22 1554528759 zap.c:740:fzap_checksize(): error 22 1554528760 zap.c:740:fzap_checksize(): error 22 1554528763 zap_micro.c:985:zap_lookup_impl(): error 2 1554528763 zap_micro.c:985:zap_lookup_impl(): error 2 1554528763 zap_micro.c:985:zap_lookup_impl(): error 2 1554528763 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528763 sa.c:370:sa_attr_op(): error 2 1554528763 dbuf.c:2702:dbuf_findbp(): error 2 1554528763 dbuf.c:2702:dbuf_findbp(): error 2 1554528763 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528763 sa.c:370:sa_attr_op(): error 2 1554528763 zap_micro.c:985:zap_lookup_impl(): error 2 1554528763 zap_micro.c:985:zap_lookup_impl(): error 2 1554528764 zap.c:740:fzap_checksize(): error 22 1554528765 zap.c:740:fzap_checksize(): error 22 1554528765 zap_micro.c:985:zap_lookup_impl(): error 2 1554528766 zap_micro.c:985:zap_lookup_impl(): error 2 1554528767 zap_micro.c:985:zap_lookup_impl(): error 2 1554528768 zap_micro.c:985:zap_lookup_impl(): error 2 1554528769 zap.c:740:fzap_checksize(): error 22 1554528770 zap.c:740:fzap_checksize(): error 22 1554528771 zap_micro.c:985:zap_lookup_impl(): error 2 1554528771 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528771 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528771 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528771 sa.c:370:sa_attr_op(): error 2 1554528771 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528771 sa.c:370:sa_attr_op(): error 2 1554528771 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528771 sa.c:370:sa_attr_op(): error 2 1554528771 dmu.c:453:dmu_spill_hold_existing(): error 2 1554528771 sa.c:370:sa_attr_op(): error 2 1554528771 zfs_dir.c:1134:zfs_get_xattrdir(): error 2

Yesterday, with the non-reverted Version i got a log flooded with error 85 messages

tcaputi commented 5 years ago

@Harvie @wolfgangwuerfl we're still trying to get this issue resolved. Can I ask what kind of data is stored in this dataset? What kinds of changes are made between snapshots? The reason it works for you right now is because we reverted the offending patch while we figure out what this issue is.

Harvie commented 5 years ago

Can I ask what kind of data is stored in this dataset? What kinds of changes are made between snapshots?

I use ZFS mostly for Proxmox LXC VPS. I've been increasing quota on that dataset before the problem emerged. I think there was some problem with increasing the quota using proxmox web UI. it was increased by different (larger) amount than i've entered, so i had to fix that manualy using zfs set. I am not sure if that was proxmox or ZFS problem, but i guess the proxmox uses zfs set anyway... Also the client was copying some files to that dataset because he was merging two LXC containers into one. Can't really tell what the problem was.

stuartabramshumphries commented 5 years ago

after applying this patch it fixes the machine crashing error - however now I'm getting filesystem corruption:

ls: cannot access 'hostname/dirname': Input/output error

total 0 d????????? ? ? ? ? ? dir1 <=== not sure exactly when this occured d????????? ? ? ? ? ? dir2 <=== this appeared after applying the patch

I've started a scrub on the pool to see what happens.

I have 3 other hosts all working ok and no corruption (they're all running same script that does a zfs recv of the same filesystems) - just this one that I recently upgraded to ubuntu 1804.

tcaputi commented 5 years ago

@stuartabramshumphries What patch are you referring to? I doubt a scrub will fix this particular issue. We are still working on a fix here.

cwedgwood commented 5 years ago

@tcaputi i think probably stems from the fact traditional file-systems report problems that fsck will repair

@stuartabramshumphries if there is detectable/correctable corruption that scrub would 'fix' it should also be fixed on demand when you access the data (at least in terms of corrupt records) if possible

behlendorf commented 5 years ago

This specific crash was caused by 5dbf8b4ed which has been reverted by d93d4b1acdf53a25ad21e20ddfca3b0d58a06cdf. Issue #7378 which 5dbf8b4ed was intended to fix is specific to encrypted raw receives and is being worked in PR #8607.

I'm closing this issue since the root cause is understood and now resolved in master.

Harvie commented 5 years ago

@behlendorf what about the corruption that @stuartabramshumphries reported? looks scary to me...

behlendorf commented 5 years ago

@stuartabramshumphries could you be specific about exactly which versions you were running and what additional patches were applied.

This issue existed in the master branch from March 27th (5dbf8b4) to April 15th (d93d4b1). Pools using a version of ZFS from master in that range and send/recv may have been effected. The offending change was never included in any tagged ZoL release, or in any downstream releases that I'm aware of.

As part of fixing the underlying root cause we've included regression tests which would have detected this issue if they'd been in place. The new tests provide considerably more test coverage which should help prevent future bugs or regressions.

stuartabramshumphries commented 5 years ago

@behlendorf initially system was ubuntu 1604 running zfs 0.6.5.6-0ubuntu27 - so just the standard setup. we have multiple machines running this for a few years and all ok.

we then upgraded one server to ubuntu 1804 and that also upgraded zfs to 0.7.5 - we then had the problems with zfs recv causing system crashes

i then upgraded to the standard 0.8.0 - this still had the hanging issue

next I upgraded to your patch - branch d93d4b1ac - this is when i noticed the first directories corruption - the second directories corruption occured later - however the box crashing has stopped and other directories are ok..

this dir has been sent/received elsewhere and is ok on the older ubuntu 1604 boxes has this fix been backported to 0.7.5 ?

behlendorf commented 5 years ago

@stuartabramshumphries I see thank you for the details. We'll make sure to get any relevant send/recv fixes backported to 0.7.14 for the distributions to pick up via their normal process.

stuartabramshumphries commented 5 years ago

@behlendorf another interesting point - the combination of ubuntu 1804 and the 0.8.0 zfs patch no longer crashes/hangs machine - however it is corrupting the filesystem.. we have this running on numerous 1604 servers running zfs 0.6.5.6 and no issues at all - numerous snapshots, can rsync, read .. however when we transmit this zfs snapshots to the 1804 server it corrupts the filesystem - same one pretty much all the time. I've tested this on brand new ubuntu 1804 boxes and so is reproducible .. something odd with this one filesystem that 1804 doesnt like?