With zfs_bclone_enabled=1 and zfs_bclone_wait_dirty=1, copying a file with unallocated blocks at the end gets stuck in the kernel forever. During this time, the kernel is also forcing txg syncs in an infinite loop.
This behavior is also observed in #15933.
(The same underlying issue also means that bclone always fails for files with unallocated blocks at the end if zfs_bclone_wait_dirty=0.)
cp never finishes and is stuck in an uninterruptible state unresponsive to both SIGINT and SIGQUIT.
Setting zfs_bclone_wait_dirty=0 while cp is still running causes cp to finish immediately with error cp: failed to clone 'dst' from 'src': Resource temporarily unavailable
Include any warning/errors/backtraces from the system logs
Nothing immediately in dmesg or dbgmsg, but during the failure /proc/spl/kstat/zfs/testpool/txgs shows that zfs is generating a lot of empty txgs:
15842 adds logic to wait for sync when encountering dirty blocks implemented as syncing when dmu_read_l0_bps returns EAGAIN, but the logic is broken. cc @behlendorf
Normally dmu_read_l0_bps returns EAGAIN for dirty blocks. However it also returns EAGAIN whenever db->db_blkptr == NULL. This normally occurs for newly-written blocks not-yet-allocated, but it also occurs for sparse, unallocated blocks beyond the end of a fully-synced object. (More specifically, this occurs for any of the conditions that cause dbuf_findbp to return ENOENT when holding the dbuf.)
In this situation, zfs_clone_range tries to force a sync when zfs_bclone_wait_dirty=1, but syncing does not allocate any blocks since none are actually dirty. Then the next attempt runs into the same condition and syncs again in an infinite loop. Setting zfs_bclone_wait_dirty=0 breaks the loop and returns an error to cp.
This is trivially reproducible by creating an empty sparse file, as seen by zdb:
Note that despite being a 256MiB file size according to ZPL metadata, the actual on-disk object is still a 1-level object with dn_maxblkid == 0 and no indirect blocks which is sufficient to trigger the db_blkptr == NULL case upon dbuf_hold.
System information
Describe the problem you're observing
With
zfs_bclone_enabled=1
andzfs_bclone_wait_dirty=1
, copying a file with unallocated blocks at the end gets stuck in the kernel forever. During this time, the kernel is also forcing txg syncs in an infinite loop.This behavior is also observed in #15933.
(The same underlying issue also means that bclone always fails for files with unallocated blocks at the end if
zfs_bclone_wait_dirty=0
.)Describe how to reproduce the problem
cp
never finishes and is stuck in an uninterruptible state unresponsive to bothSIGINT
andSIGQUIT
.Setting
zfs_bclone_wait_dirty=0
whilecp
is still running causescp
to finish immediately with errorcp: failed to clone 'dst' from 'src': Resource temporarily unavailable
Include any warning/errors/backtraces from the system logs
Nothing immediately in
dmesg
ordbgmsg
, but during the failure/proc/spl/kstat/zfs/testpool/txgs
shows that zfs is generating a lot of empty txgs:Root cause
15842 adds logic to wait for sync when encountering dirty blocks implemented as syncing when
dmu_read_l0_bps
returnsEAGAIN
, but the logic is broken. cc @behlendorfNormally
dmu_read_l0_bps
returnsEAGAIN
for dirty blocks. However it also returnsEAGAIN
wheneverdb->db_blkptr == NULL
. This normally occurs for newly-written blocks not-yet-allocated, but it also occurs for sparse, unallocated blocks beyond the end of a fully-synced object. (More specifically, this occurs for any of the conditions that causedbuf_findbp
to returnENOENT
when holding the dbuf.)In this situation,
zfs_clone_range
tries to force a sync whenzfs_bclone_wait_dirty=1
, but syncing does not allocate any blocks since none are actually dirty. Then the next attempt runs into the same condition and syncs again in an infinite loop. Settingzfs_bclone_wait_dirty=0
breaks the loop and returns an error tocp
.This is trivially reproducible by creating an empty sparse file, as seen by
zdb
:Note that despite being a 256MiB file size according to ZPL metadata, the actual on-disk object is still a 1-level object with
dn_maxblkid == 0
and no indirect blocks which is sufficient to trigger thedb_blkptr == NULL
case upondbuf_hold
.