Closed ilovezfs closed 5 years ago
FWIW I do these most days, send from OSX to both OSX & FreeBSD servers via ssh. The receiving datasets are all readonly though.
e.g. zfs send -vR tub/shared/projects@20140331 | ssh wintermute sudo zfs recv -Fuv zroot/zfs/shared/projects
and similar.
The panic is actually because the code that detects the old "dnode: ARC release bug triggered: %p (%lld)-- sorry" would forcefully decrease the hung dnode's counter, until it reached 0 and continue. Causing panic elsewhere.
Looking at why we are deadlocked though;
The dnode itself
(lldb) p *db
(dmu_buf_impl_t) $2 = {
db = (db_object = 0, db_offset = 0, db_size = 16384, db_data = 0xffffff808e90f098)
db_objset = 0xffffff80915de848
db_dnode_handle = 0xffffff80915de868
db_parent = 0xffffff8090da05f0
db_hash_next = 0x0000000000000000
db_blkid = 0
db_blkptr = 0xffffff808e97b320
db_level = '\0'
db_mtx = {
m_owner = 0xffffff8015424b20
m_lock = 0xffffff8021ad9170
m_padding = ([0] = '\0', [1] = '\0', [2] = '\0', [3] = '\0', [4] = '\0', [5] = '\0', [6] = '\0', [7] = '\0', [8] = '\0', [9] = '\0', [10] = '\0', [11] = '\0', [12] = '\0', [13] = '\0')
}
db_state = DB_CACHED
db_holds = (rc_count = 0)
db_buf = 0xffffff809a53b3e0
db_changed = (pad = 0, pad2 = 0)
db_data_pending = 0x0000000000000000
db_last_dirty = 0x0000000000000000
db_link = {
list_next = 0xffffff8090da0698
list_prev = 0xffffff808ec74ce0
}
db_user_ptr = 0xffffff80935c7b38
db_user_data_ptr_ptr = 0x0000000000000000
db_evict_func = 0xffffff7f8919f6e0 (zfs`dnode_buf_pageout at dnode.c:977)
db_immediate_evict = '\0'
db_freed_in_flight = '\0'
db_dirtycnt = '\0'
}
threads from spindump
after we have deadlocked;
*993 VFS_ROOT + 198 (mach_kernel) [0xffffff80003fa1a6]
*993 zfs_vfs_root + 63 (zfs) [0xffffff7f80e352ff]
*993 rrw_enter + 45 (zfs) [0xffffff7f80dc8acd]
*993 rrw_enter_read + 165 (zfs) [0xffffff7f80dc87f5]
*993 spl_cv_wait + 51 (spl) [0xffffff7f80d524a3]
*993 vnode_getattr + 119 (mach_kern
el) [0xffffff80003fbe67]
*993 zfs_vnop_getattr + 75 (zfs) [0xffffff7f80e44f2b]
*993 zfs_getattr + 147 (zfs) [0xffffff7f80e3d5f3]
*993 rrw_enter + 45 (zfs) [0xffffff7f80dc8acd]
*993 rrw_enter_read + 165 (zfs) [0xffffff7f80dc87f5]
*993 spl_cv_wait + 51 (spl) [0xffffff7f80d524a3]
*993 vnode_getattr + 119 (mach_kernel
) [0xffffff80003fbe67]
*993 zfs_vnop_getattr + 75 (zfs) [0xffffff7f80e44f2b]
*993 zfs_getattr + 147 (zfs) [0xffffff7f80e3d5f3]
*993 rrw_enter + 45 (zfs) [0xffffff7f80dc8acd]
*993 rrw_enter_read + 165 (zfs) [0xffffff7f80dc87f5]
*993 spl_cv_wait + 51 (spl) [0xffffff7f80d524a3]
*993 VFS_ROOT + 198
(mach_kernel) [0xffffff80003fa1a6]
*993 zfs_vfs_root + 63 (zfs) [0xffffff7f80e352ff]
*993 rrw_enter + 45 (zfs) [0xffffff7f80dc8acd]
*993 rrw_enter_read + 165 (zfs) [0xffffff7f80dc87f5]
*993 spl_cv_wait + 51 (spl) [0xffffff7f80d524a3]
*993 zfsdev_ioctl + 1260 (zfs) [0xffffff7f80e20d8c]
*993 zfs_ioc_recv + 1470 (zfs) [0xffffff7f80e2389e]
*993 dmu_recv_end + 71 (zfs) [0xffffff7f80d8dc57]
*993 dmu_recv_existing_end + 124 (zfs) [0xffffff7f80d8dd8c]
*993 dsl_sync_task + 567 (zfs) [0xffffff7f80db8e47]
*993 txg_wait_synced + 261 (zfs) [0xffffff7f80dea235]
*993 spl_cv_wait + 51 (spl) [0xffffff7f80d524a3]
*993 txg_sync_thread + 962 (zfs) [0xffffff7f80de9f12]
*993 spa_sync + 1101 (zfs) [0xffffff7f80dd9bad]
*993 dsl_pool_sync + 1039 (zfs) [0xffffff7f80daf9cf]
*993 dsl_sync_task_sync + 300 (zfs) [0xffffff7f80db90dc]
*993 dmu_recv_end_sync + 421 (zfs) [0xffffff7f80d8e1d5]
*993 dsl_dataset_clone_swap_sync_impl + 159 (zfs) [0xffffff7f80da2f7f]
*993 dmu_objset_evict + 773 (zfs) [0xffffff7f80d85415]
*993 dnode_special_close + 62 (zfs) [0xffffff7f80d98c6e]
*993 delay_for_interval + 39 (mach_kernel) [0xffffff8000222197]
*993 call_continuation + 23 (mach_kernel) [0xffffff80002d6ff7]
*993 vnop_reclaim_thread + 238 (zfs) [0xffffff7f80e4460e]
*993 rw_enter + 23 (spl) [0xffffff7f80d54127]
*993 lck_rw_lock_shared_gen + 118 (mach_kernel) [0xffffff80002d5156]
*993 thread_block_reason + 204 (mach_kernel) [0xffffff8000235d8c]
It is interesting to note that even reclaim thread is stuck on the rw_enter lock.
The code/locks involved are, top 4 are waiting on
ZFS_ENTER(zfsvfs);
#define ZFS_ENTER_NOERROR(zfsvfs) \
rrw_enter(&(zfsvfs)->z_teardown_lock, RW_READER, FTAG)
and reclaim thread is waiting on
rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_READER);
Ok, the recv end goes through this block of code
error = zfs_suspend_fs(zsb);
end_err = dmu_recv_end(&drc, zsb);
if (error == 0)
error = zfs_resume_fs(zsb, tofs);
Where zfs_suspend_fs()
holds BOTH rw_lock and rrw_lock. zfs_resume_fs()
releases the locks again. However, the thread is stuck in dmu_recv_end()
due to
*994 dmu_recv_end + 71 (zfs) [0xffffff7f80d8fc57]
*994 dmu_recv_existing_end + 124 (zfs) [0xffffff7f80d8fd8c]
*994 dsl_sync_task + 567 (zfs) [0xffffff7f80dbae47]
*994 txg_wait_synced + 261 (zfs) [0xffffff7f80dec235]
*994 spl_cv_wait + 51 (spl) [0xffffff7f80d544a3]
waiting for dmu_txg sync.
But the sync thread is stuck in
*994 txg_sync_thread + 962 (zfs) [0xffffff7f80debf12]
*994 spa_sync + 1101 (zfs) [0xffffff7f80ddbbad]
*994 dsl_pool_sync + 1039 (zfs) [0xffffff7f80db19cf]
*994 dsl_sync_task_sync + 300 (zfs) [0xffffff7f80dbb0dc]
*994 dmu_recv_end_sync + 421 (zfs) [0xffffff7f80d901d5]
*994 dsl_dataset_clone_swap_sync_impl + 159 (zfs) [0xffffff7f80da4f7f]
*994 dmu_objset_evict + 773 (zfs) [0xffffff7f80d87415]
*994 dnode_special_close + 62 (zfs) [0xffffff7f80d9ac6e]
Appears to still be broken. Now it freezes up.
Looking at the code:
getzfsvfs(const char *dsname, zfsvfs_t **zfvp)
{
...
*zfvp = dmu_objset_get_user(os);
if (*zfvp) {
VFS_HOLD((*zfvp)->z_vfs);
and uses of it, like that of zfs_ioc_recv:
if (getzfsvfs(tofs, &zfsvfs) == 0) {
ds = dmu_objset_ds(zfsvfs->z_os);
error = zfs_suspend_fs(zfsvfs);
VFS_RELE(zfsvfs->z_vfs);
So it appears that getzfsvfs()
will grab the zfsvfs
while under dmu_objset_hold()
lock, then lock the VFS by calling VFS_HOLD()
. Once we are done with zfsvfs
it calls VFS_RELE()
.
On O3X, both VFS_HOLD
and VFS_RELE
are empty defines, as we do not have anything directly like IllumOS versions.
Even though the IllumOS comment for VFS_HOLD
sounds somewhat benign, it would appear that it will stop unmounts from happening. Whereas for us, unmount can happen, and z_os
is released (possibly reused by something else) so our zfsvfs
will be over-written by random garbage (as well as us using zfsvfs
after it was freed).
Our closest call to VFS_HOLD
appears to be vfs_busy
, but there are some implementational differences.
IllumOS VFS_HOLD
is an atomic operation, and can therefore be called repeatedly. A good example of that is zfs_vfsops.c calling it, and calls zfsctl_create
which also call it from gfs.c
, which in turn calls getroot
and we end up in zfs_znode which calls it a third time.
On OSX you can only call it once, any nesting calls will deadlock.
A second issue is that even when calling vfs_busy( , LK_NOWAIT)
it only detects when unmount is happening, and returns immediately. It does not include mounting, so we can not call vfs_busy
from within the zfs_vfs_mount()
context. (which is where zfsctl_create()
is call as above.)
So we can add #ifdef APPLE vfs_busy()
calls around the functions that call getzfsvfs()
(there are 5).
Or we can attempt to make VFS_HOLD, use mount_fsprivate()
to get the zfsvfs
ptr, and store an atomic in the struct. And upon reaching 1
(or 0
) in VFS_HOLD
(VFS_RELE
respectively) call vfs_busy
(vfs_unbusy
). With the additional hack that zfs_vfs_mount()
creates zfsvfs
with the atomic already at 1
since we may not call vfs_busy
at this time (nor do we need to, it is locked to call us).
@lundman Or is this still an issue?
http://bpaste.net/show/196813/