Open fcicq opened 6 years ago
Usingbdi->congested_fn
looks like a nice way to implement this. But I don't think vdev_queue_class_to_issue()
is exactly what you want since it will only return ZIO_PRIORITY_NUM_QUEUEABLE
when all of the queues are full. My suggestion would be to implement a new function in this layer which defines what congestion is. Thanks for working on this!
@behlendorf
Is this correct? for both sync/async, return txg_stalled(spa->spa_dsl_pool);
(and spa is zsb->z_os->os_spa?)
@fcicq I'd recommend dsl_pool_need_dirty_delay()
instead for async/sync writes, it will kick in the when the dmu starts to throttle new transactions because the dirty limit has been exceeded.
@behlendorf thanks. that is a simplified method... so the both checks should be preserved, even txg_kick will never be called on this path.
static int zfs_congested_fn(void *congested_data, int bdi_bits)
{
spa_t *spa = congested_data;
dsl_pool_t *dp = spa->spa_dsl_pool;
return dsl_pool_need_dirty_delay(dp) || txg_stalled(dp);
}
and in zfs_domount() do :
+ sb->s_bdi->congested_fn = zfs_congested_fn;
+ sb->s_bdi->congested_data = zsb->z_os->os_spa;
+ sb->s_bdi->capabilities = BDI_CAP_CGROUP_WRITEBACK;
sb->s_iflags |= SB_I_CGROUPWB; is for later (enabled for dev), another issue though. I need wbc_account_io() but exported by EXPORT_SYMBOL_GPL :(
Now this one is resolved. but I cant do pull request atm ... sorry.
System information
Describe the problem you're observing
linux kernel 4.2 introduced CONFIG_CGROUP_WRITEBACK. you can see the added BDI_CAP_CGROUP_WRITEBACK in https://github.com/torvalds/linux/commit/89e9b9e07a390c50980d10aa37a04631db5a23ab
sb->s_bdi->capabilities = BDI_CAP_CGROUP_WRITEBACK
is missing in zfs_domount in zfs_vfsops.c, after zpl_bdi_setup().in order to match the functionality of linux bdi(backing_dev_info), from wb_congested, it is clear that, either to maintain the state via set_bdi_congested/clear_bdi_congested (BLK_RW_SYNC/BLK_RW_ASYNC), or implement congested_fn like dm/md do.
one possible method would be comparing queue depth with vdev_queue_class_max_active() for ZIO_PRIORITY_SYNC_WRITE & ZIO_PRIORITY_ASYNC_WRITE. so I think vdev_queue_class_to_issue() could do the bit flips for me, if it returns ZIO_PRIORITY_NUM_QUEUEABLE, mark as congested. any io_done (if p == ZIO_PRIORITY_ASYNC_WRITE) to undo? only supporting BLK_RW_ASYNC is also acceptable (like nfs).
I'm also working for #1952 to be resolved, if someone is interested.
Describe how to reproduce the problem
N/A
Include any warning/errors/backtraces from the system logs
N/A