mutex_destroy(&dn->dn_mtx) caused panic in dnode_dest

loyou commented 6 years ago

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	16.04.3
Linux Kernel	4.4.0-116-generic
Architecture	x86_64
ZFS Version	0.7.6
SPL Version	0.7.6

Describe the problem you're observing

Server get kernel panic and crashed.

Describe how to reproduce the problem

Not sure.

Include any warning/errors/backtraces from the system logs

[2322566.200611] VERIFY3(((*({ __attribute__((unused)) typeof((&dn->dn_mtx)->m_owner) __var = ( typeof((&dn->dn_mtx)->m_owner)) 0; (volatile typeof((&dn->dn_mtx)->m_owner) *)&((&dn->dn_mtx)->m_owner); }))) == ((void *)0)) failed (ffff882ed28b7000 ==           (null))
[2322566.208643] PANIC at dnode.c:178:dnode_dest()
[2322566.210950] Kernel panic - not syncing: VERIFY3(((*({ __attribute__((unused)) typeof((&dn->dn_mtx)->m_owner) __var = ( typeof((&dn->dn_mtx)->m_owner)) 0; (volatile typeof((&dn->dn_mtx)->m_owner) *)&((&dn->dn_mtx)->m_owner); }))) == ((void *)0)) failed (ffff882ed28b7000 ==           (null))

[2322566.224994] CPU: 11 PID: 940 Comm: dbu_evict Tainted: P           OE   4.4.0-116-generic #140~14.04.1
[2322566.231483] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.0.1 02/12/2016
[2322566.238246]  0000000000000000 ffff882ea166fa30 ffffffff813e416c ffffffffc0302e5e
[2322566.245831]  00000000000000b2 ffff882ea166faa8 ffffffff81186b2c ffff882e00000010
[2322566.254264]  ffff882ea166fab8 ffff882ea166fa58 ffffffff81187333 ffff882ea166fad8
[2322566.263393] Call Trace:
[2322566.267965]  [<ffffffff813e416c>] dump_stack+0x63/0x87
[2322566.272532]  [<ffffffff81186b2c>] panic+0xc8/0x221
[2322566.277187]  [<ffffffff81187333>] ? printk+0x50/0x52
[2322566.281952]  [<ffffffffc02fc75d>] spl_panic+0xfd/0x100 [spl]
[2322566.286789]  [<ffffffffc02f6dda>] ? spl_kmem_free+0x2a/0x40 [spl]
[2322566.291784]  [<ffffffffc03fdf72>] dnode_dest+0xd2/0x140 [zfs]
[2322566.296840]  [<ffffffffc02f88cd>] spl_kmem_cache_free+0x2d/0x1c0 [spl]
[2322566.302010]  [<ffffffffc03fe400>] dnode_destroy+0x1d0/0x220 [zfs]
[2322566.307270]  [<ffffffffc03fefdb>] dnode_special_close+0x4b/0x70 [zfs]
[2322566.312389]  [<ffffffffc03f22fa>] dmu_objset_evict_done+0x1a/0x250 [zfs]
[2322566.317503]  [<ffffffffc03fe41c>] dnode_destroy+0x1ec/0x220 [zfs]
[2322566.322465]  [<ffffffffc03fe4ea>] dnode_buf_evict_async+0x9a/0xe0 [zfs]
[2322566.327284]  [<ffffffffc02fa594>] taskq_thread+0x2b4/0x4b0 [spl]
[2322566.332116]  [<ffffffff810ab0c0>] ? wake_up_q+0x80/0x80
[2322566.336861]  [<ffffffffc03fe450>] ? dnode_destroy+0x220/0x220 [zfs]
[2322566.341482]  [<ffffffffc02fa2e0>] ? task_done+0xb0/0xb0 [spl]
[2322566.345984]  [<ffffffff8109f138>] kthread+0xd8/0xf0
[2322566.350363]  [<ffffffff8109f060>] ? kthread_park+0x60/0x60
[2322566.354762]  [<ffffffff81819f35>] ret_from_fork+0x55/0x80
[2322566.359072]  [<ffffffff8109f060>] ? kthread_park+0x60/0x60

dn->dn_mtx 's owner is not NULL caused this panic.

171  static void
172  dnode_dest(void *arg, void *unused)
173  {
174     int i;
175     dnode_t *dn = arg;
176  
177     rw_destroy(&dn->dn_struct_rwlock);
178     mutex_destroy(&dn->dn_mtx);

check its holder,

crash> task_struct.pid ffff882ed28b7000
  pid = 941

crash> bt 941
PID: 941    TASK: ffff882ed28b7000  CPU: 7   COMMAND: "dbuf_evict"
 #0 [ffff882ea1673cd8] __schedule at ffffffff81815c63
 #1 [ffff882ea1673d20] schedule at ffffffff81816215
 #2 [ffff882ea1673d38] schedule_hrtimeout_range_clock at ffffffff81818f0c
 #3 [ffff882ea1673dc8] schedule_hrtimeout_range at ffffffff81818fa3
 #4 [ffff882ea1673dd8] cv_timedwait_hires_common at ffffffffc02fe703 [spl]
 #5 [ffff882ea1673e58] cv_timedwait_sig_hires at ffffffffc02fe804 [spl]
 #6 [ffff882ea1673e68] dbuf_evict_thread at ffffffffc03e205a [zfs]
 #7 [ffff882ea1673e98] thread_generic_wrapper at ffffffffc02f93f3 [spl]
 #8 [ffff882ea1673ec8] kthread at ffffffff8109f138
 #9 [ffff882ea1673f50] ret_from_fork at ffffffff81819f35

static void
dbuf_evict_thread(void)
{
        callb_cpr_t cpr;

        CALLB_CPR_INIT(&cpr, &dbuf_evict_lock, callb_generic_cpr, FTAG);

        mutex_enter(&dbuf_evict_lock);
        while (!dbuf_evict_thread_exit) {
                while (!dbuf_cache_above_lowater() && !dbuf_evict_thread_exit) {
                        CALLB_CPR_SAFE_BEGIN(&cpr);
                        (void) cv_timedwait_sig_hires(&dbuf_evict_cv,                           <--- task 941 waiting here

loyou commented 6 years ago

after checked dn=ffff882a3d59a470 in dnode_dest, seems dn->dn_mtx was being hold(mutex_enter) in dbuf_evict_thread and now it has been released(mutex_exit).

crash> dnode_t.dn_mtx ffff882a3d59a470 -xo
typedef struct dnode {
  [ffff882a3d59a568] kmutex_t dn_mtx;
} dnode_t;
crash> kmutex_t.m_owner ffff882a3d59a568 
  m_owner = 0x0

loyou commented 6 years ago

our servers crashed for 5 times with above call stack and 1 newer without kdump for following call stack. I think they are same issue.

Sep 6 18:38:56 10.16.110.66 [5349213.947198] VERIFY3(((_attribute__((unused)) typeof((&dn->dn_mtx)->m_owner) _)&((&dn->dn_mtx)->m_owner); }))) == ((void *)0)) failed (ffff883e7cbe3800 == (null))
Sep 6 18:38:56 10.16.110.66 [5349213.960792] PANIC at dnode.c:178:dnode_dest()
Sep 6 18:38:56 10.16.110.66 [5349213.964431] Kernel panic - not syncing: VERIFY3(((_attribute__((unused)) typeof((&dn->dn_mtx)->m_owner) _)&((&dn->dn_mtx)->m_owner); }))) == ((void *)0)) failed (ffff883e7cbe3800 == (null))
[5349213.964431] 
Sep 6 18:38:56 10.16.110.66 [5349213.983351] CPU: 36 PID: 2367 Comm: txg_sync Tainted: P OE 4.4.0-116-generic #140~14.04.1
Sep 6 18:38:57 10.16.110.66 [5349213.991291] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS 4.1.14 01/09/2018
Sep 6 18:38:57 10.16.110.66 [5349213.999843] 0000000000000000
Sep 6 18:38:57 10.16.110.66 ffff883e42287818
Sep 6 18:38:57 10.16.110.66 ffffffff813e416c
Sep 6 18:38:57 10.16.110.66 ffffffffc0464e5e
Sep 6 18:38:57 10.16.110.66 
Sep 6 18:38:57 10.16.110.66 [5349214.009000] 00000000000000b2
Sep 6 18:38:57 10.16.110.66 ffff883e42287890
Sep 6 18:38:57 10.16.110.66 ffffffff81186b2c
Sep 6 18:38:57 10.16.110.66 ffff883e00000010
Sep 6 18:38:57 10.16.110.66 
Sep 6 18:38:57 10.16.110.66 [5349214.018506] ffff883e422878a0
Sep 6 18:38:57 10.16.110.66 ffff883e42287840
Sep 6 18:38:57 10.16.110.66 ffffffff81187333
Sep 6 18:38:57 10.16.110.66 ffff883e422878c0
Sep 6 18:38:57 10.16.110.66 
Sep 6 18:38:57 10.16.110.66 [5349214.028437] Call Trace:
Sep 6 18:38:57 10.16.110.66 [5349214.033347] [<ffffffff813e416c>] dump_stack+0x63/0x87
Sep 6 18:38:57 10.16.110.66 [5349214.038428] [<ffffffff81186b2c>] panic+0xc8/0x221
Sep 6 18:38:57 10.16.110.66 [5349214.043398] [<ffffffff81187333>] ? printk+0x50/0x52
Sep 6 18:38:57 10.16.110.66 [5349214.048284] [<ffffffffc045e75d>] spl_panic+0xfd/0x100 [spl]
Sep 6 18:38:57 10.16.110.66 [5349214.053257] [<ffffffffc0458dda>] ? spl_kmem_free+0x2a/0x40 [spl]
Sep 6 18:38:57 10.16.110.66 [5349214.058379] [<ffffffffc0f6bf72>] dnode_dest+0xd2/0x140 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.063533] [<ffffffffc045a8cd>] spl_kmem_cache_free+0x2d/0x1c0 [spl]
Sep 6 18:38:57 10.16.110.66 [5349214.068707] [<ffffffffc0f6c400>] dnode_destroy+0x1d0/0x220 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.073940] [<ffffffffc0f6cfdb>] dnode_special_close+0x4b/0x70 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.079073] [<ffffffffc0f602fa>] dmu_objset_evict_done+0x1a/0x250 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.084120] [<ffffffffc0f60602>] dmu_objset_evict+0xd2/0xe0 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.088997] [<ffffffffc0f75ce1>] dsl_dataset_clone_swap_sync_impl+0x151/0x880 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.098510] [<ffffffffc0f52666>] ? dmu_buf_rele+0x36/0x40 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.103383] [<ffffffffc0f7e991>] ? dsl_dir_rele+0x31/0x40 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.108086] [<ffffffffc0f74248>] ? dsl_dataset_hold+0x118/0x240 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.112679] [<ffffffffc045a9d8>] ? spl_kmem_cache_free+0x138/0x1c0 [spl]
Sep 6 18:38:57 10.16.110.66 [5349214.117230] [<ffffffffc0f63dcf>] dmu_recv_end_sync+0xbf/0x490 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.121801] [<ffffffffc0f964f0>] ? rrw_enter_write+0x90/0xa0 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.126332] [<ffffffffc0f8b562>] dsl_sync_task_sync+0x112/0x120 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.130830] [<ffffffffc0f82f6d>] dsl_pool_sync+0x2cd/0x400 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.135204] [<ffffffffc0fa0978>] spa_sync+0x3e8/0xd40 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.139442] [<ffffffff810ab0d2>] ? default_wake_function+0x12/0x20
Sep 6 18:38:57 10.16.110.66 [5349214.143656] [<ffffffffc0fb321a>] txg_sync_thread+0x2ca/0x490 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.147770] [<ffffffffc0fb2f50>] ? txg_delay+0x150/0x150 [zfs]
Sep 6 18:38:57 10.16.110.66 [5349214.151744] [<ffffffffc045b380>] ? __thread_exit+0x20/0x20 [spl]
Sep 6 18:38:57 10.16.110.66 [5349214.155619] [<ffffffffc045b3f3>] thread_generic_wrapper+0x73/0x80 [spl]
Sep 6 18:38:57 10.16.110.66 [5349214.159427] [<ffffffff8109f138>] kthread+0xd8/0xf0
Sep 6 18:38:57 10.16.110.66 [5349214.163113] [<ffffffff8109f060>] ? kthread_park+0x60/0x60
Sep 6 18:38:57 10.16.110.66 [5349214.166840] [<ffffffff81819f35>] ret_from_fork+0x55/0x80
Sep 6 18:38:57 10.16.110.66 [5349214.170529] [<ffffffff8109f060>] ? kthread_park+0x60/0x60
Sep 6 18:38:57 10.16.110.66 [5349214.174247] Kernel Offset: disabled
Sep 6 18:38:57 10.16.110.66 [5349214.251733] ---[ end Kernel panic - not syncing: VERIFY3(((_attribute__((unused)) typeof((&dn->dn_mtx)->m_owner) _)&((&dn->dn_mtx)->m_owner); }))) == ((void *)0)) failed (ffff883e7cbe3800 == (null))
[5349214.251733]

PaulZ-98 commented 4 years ago

We are seeing this at Datto using 0.7.8. Anybody seeing it in 0.8.x?

PaulZ-98 commented 4 years ago

Hello @loyou we also face this issue with 0.7.8.x. How did you get around the problem - upgrade to zfs 0.8.x ?

loyou commented 4 years ago

no upgrade util now, I made this workaround and works fine.

static void 
dnode_dest(void *arg, void *unused)
{
    int i;
    dnode_t *dn = arg;
+   struct task_struct *t;

    rw_destroy(&dn->dn_struct_rwlock);
+   while ((t = mutex_owner(&dn->dn_mtx)) != NULL) {
+       printk("%s mutex held by %s:%d, waiting for released\n",
+           __func__, t->comm, t->pid);
+       schedule_timeout_interruptible(msecs_to_jiffies(1000));
+   }
    ...

PaulZ-98 commented 4 years ago

Thank you for this advice @loyou. Did the printk output confirm that it was dbuf_evict_thread that was holding the dn_mtx when dnode_special_close --> dnode_destroy --> dnode_dest occurred on the same dnode?

loyou commented 4 years ago

yes, I think so, the workaround patch makes sure nobody is holding dn->dn_mtx before destroy.

openzfs / zfs