openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

PANIC at zfs_znode.c:1106:zfs_zinactive() #3304

Closed chrisrd closed 9 years ago

chrisrd commented 9 years ago

@dweeezil Since your patch for #3225 was commited as @40d06e3, the buildbots have been very occasionally (13 times out of 426 buildbot runs since the commit!) getting PANICs like:

VERIFY3(((*({ __attribute__((unused)) typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) __var = ( typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->
m_owner)) 0; (volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))
PANIC at zfs_znode.c:1106:zfs_zinactive()

In order of appearance:

It's coming from this part of the patch:

--- a/module/zfs/zfs_znode.c
+++ b/module/zfs/zfs_znode.c
@@ -1097,23 +1097,13 @@ zfs_zinactive(znode_t *zp)
 {
        zfs_sb_t *zsb = ZTOZSB(zp);
        uint64_t z_id = zp->z_id;
-       boolean_t drop_mutex = 0;

        ASSERT(zp->z_sa_hdl);

        /*
         * Don't allow a zfs_zget() while were trying to release this znode.
-        *
-        * Linux allows direct memory reclaim which means that any KM_SLEEP
-        * allocation may trigger inode eviction.  This can lead to a deadlock
-        * through the ->shrink_icache_memory()->evict()->zfs_inactive()->
-        * zfs_zinactive() call path.  To avoid this deadlock the process
-        * must not reacquire the mutex when it is already holding it.
         */
-       if (!ZFS_OBJ_HOLD_OWNED(zsb, z_id)) {
-               ZFS_OBJ_HOLD_ENTER(zsb, z_id);
-               drop_mutex = 1;
-       }
+       ZFS_OBJ_HOLD_ENTER(zsb, z_id);

        mutex_enter(&zp->z_lock);

What was the thinking there?

dweeezil commented 9 years ago

I think it's effectively showing us that some reclaim paths haven't been properly locked down. See #3293 for a patch which should hopefully lock down the remaining paths.

chrisrd commented 9 years ago

Oh, now I see what's going on over there! OK, in that case, I've just checked the 13 PANICs above and they all have zpl_xattr_set() in the call path, which means they're covered by @behlendorf's commit in #3293.

Issue closed.