Open ernsteiswuerfel opened 8 months ago
The number of btrfs specific data races decreased on kernel v6.7-rc5, also there are no longer direct mentions in the ""BUG: KCSAN: data-race in ..." line. Some btrfs mentions prevails however like in the BUG: KCSAN: data-race in wq_worker_tick / wq_worker_tick
and BUG: KCSAN: data-race in process_one_work / process_one_work
races.
I enabled PROVE_LOCKING=y
in the .config in order to get additional output via KCSAN_VERBOSE=y
.
On v6.7-rc5 I get 3 times:
==================================================================
BUG: KCSAN: data-race in wq_worker_tick / wq_worker_tick
read to 0xffff9c9b5e37cda8 of 8 bytes by interrupt on cpu 21:
wq_worker_tick+0x5f/0x250
scheduler_tick+0x1a0/0x1c0
update_process_times+0xbf/0xf0
tick_nohz_highres_handler+0xbe/0x130
__hrtimer_run_queues+0x3b0/0x5f0
hrtimer_interrupt+0x1cb/0x3e0
__sysvec_apic_timer_interrupt+0x9d/0x260
sysvec_apic_timer_interrupt+0x80/0xc0
asm_sysvec_apic_timer_interrupt+0x1a/0x20
delay_halt_mwaitx+0x38/0x50
delay_halt+0x3a/0x80
kcsan_setup_watchpoint+0x2c6/0x690
lzo1x_decompress_safe+0x216/0x9f0
lzo_decompress_bio+0x360/0x4c0
end_compressed_bio_read+0x100/0x150
__btrfs_bio_end_io+0x6e/0xc0
btrfs_orig_bbio_end_io+0x7c/0x170
btrfs_check_read_bio+0x5c0/0x870
btrfs_end_bio_work+0x61/0x90
process_one_work+0x452/0x940
worker_thread+0x3fb/0x730
kthread+0x189/0x1c0
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x11/0x20
2 locks held by kworker/u65:0/174:
#0: ffff9c944931e938 ((wq_completion)btrfs-endio){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
#1: ffffabc00073fe10 ((work_completion)(&bbio->end_io_work)){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
irq event stamp: 11485
hardirqs last enabled at (11484): [<ffffffffb239c810>] _raw_spin_unlock_irqrestore+0x50/0x60
hardirqs last disabled at (11485): [<ffffffffb2385fee>] sysvec_apic_timer_interrupt+0xe/0xc0
softirqs last enabled at (11151): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
softirqs last disabled at (11146): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
write to 0xffff9c9b5e37cda8 of 8 bytes by interrupt on cpu 4:
wq_worker_tick+0x75/0x250
scheduler_tick+0x1a0/0x1c0
update_process_times+0xbf/0xf0
tick_nohz_highres_handler+0xbe/0x130
__hrtimer_run_queues+0x3b0/0x5f0
hrtimer_interrupt+0x1cb/0x3e0
__sysvec_apic_timer_interrupt+0x9d/0x260
sysvec_apic_timer_interrupt+0x80/0xc0
asm_sysvec_apic_timer_interrupt+0x1a/0x20
lzo1x_decompress_safe+0x209/0x9f0
lzo_decompress_bio+0x360/0x4c0
end_compressed_bio_read+0x100/0x150
__btrfs_bio_end_io+0x6e/0xc0
btrfs_orig_bbio_end_io+0x7c/0x170
btrfs_check_read_bio+0x5c0/0x870
btrfs_end_bio_work+0x61/0x90
process_one_work+0x452/0x940
worker_thread+0x3fb/0x730
kthread+0x189/0x1c0
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x11/0x20
2 locks held by kworker/u65:5/283:
#0: ffff9c944931e938 ((wq_completion)btrfs-endio){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
#1: ffffabc001f03e10 ((work_completion)(&bbio->end_io_work)){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
irq event stamp: 8120
hardirqs last enabled at (8119): [<ffffffffb239c810>] _raw_spin_unlock_irqrestore+0x50/0x60
hardirqs last disabled at (8120): [<ffffffffb2385fee>] sysvec_apic_timer_interrupt+0xe/0xc0
softirqs last enabled at (7954): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
softirqs last disabled at (7949): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
value changed: 0x000000000010d577 -> 0x000000000010ef81
Reported by Kernel Concurrency Sanitizer on:
CPU: 4 PID: 283 Comm: kworker/u65:5 Tainted: G T 6.7.0-rc5-Zen3 #2
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.20 09/27/2023
Workqueue: btrfs-endio btrfs_end_bio_work
==================================================================
and 2 times:
==================================================================
BUG: KCSAN: data-race in process_one_work / process_one_work
write to 0xffff9c9450ed49a0 of 8 bytes by task 198 on cpu 15:
process_one_work+0x4b0/0x940
worker_thread+0x3fb/0x730
kthread+0x189/0x1c0
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x11/0x20
2 locks held by kworker/u66:22/198:
#0: ffff9c94488c2d38 ((wq_completion)btrfs-endio-write){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
#1: ffffabc0007ffe10 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
irq event stamp: 11993
hardirqs last enabled at (11993): [<ffffffffb239c810>] _raw_spin_unlock_irqrestore+0x50/0x60
hardirqs last disabled at (11992): [<ffffffffb239c52c>] _raw_spin_lock_irqsave+0x6c/0x70
softirqs last enabled at (10742): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
softirqs last disabled at (10735): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
read to 0xffff9c9450ed49a0 of 8 bytes by task 199 on cpu 29:
process_one_work+0x49c/0x940
worker_thread+0x3fb/0x730
kthread+0x189/0x1c0
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x11/0x20
2 locks held by kworker/u66:23/199:
#0: ffff9c94488c2d38 ((wq_completion)btrfs-endio-write){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
#1: ffffabc000807e10 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
irq event stamp: 54984
hardirqs last enabled at (54984): [<ffffffffb239c810>] _raw_spin_unlock_irqrestore+0x50/0x60
hardirqs last disabled at (54983): [<ffffffffb239c52c>] _raw_spin_lock_irqsave+0x6c/0x70
softirqs last enabled at (53685): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
softirqs last disabled at (53678): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
Reported by Kernel Concurrency Sanitizer on:
CPU: 29 PID: 199 Comm: kworker/u66:23 Tainted: G T 6.7.0-rc5-Zen3 #2
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.20 09/27/2023
Workqueue: btrfs-endio-write btrfs_work_helper
==================================================================
Also I get:
==================================================================
BUG: KCSAN: data-race in __hrtimer_run_queues / hrtimer_active
read to 0xffff9c9b5e21a508 of 8 bytes by interrupt on cpu 6:
hrtimer_active+0x7d/0xf0
task_tick_fair+0x2f/0x2e0
scheduler_tick+0xd3/0x1c0
update_process_times+0xbf/0xf0
tick_nohz_highres_handler+0xbe/0x130
__hrtimer_run_queues+0x3b0/0x5f0
hrtimer_interrupt+0x1cb/0x3e0
__sysvec_apic_timer_interrupt+0x9d/0x260
sysvec_apic_timer_interrupt+0x80/0xc0
asm_sysvec_apic_timer_interrupt+0x1a/0x20
__tsan_read1+0xc4/0x1b0
lzo1x_decompress_safe+0x37b/0x9f0
lzo_decompress_bio+0x360/0x4c0
end_compressed_bio_read+0x100/0x150
__btrfs_bio_end_io+0x6e/0xc0
btrfs_orig_bbio_end_io+0x7c/0x170
btrfs_check_read_bio+0x5c0/0x870
btrfs_end_bio_work+0x61/0x90
process_one_work+0x452/0x940
worker_thread+0x3fb/0x730
kthread+0x189/0x1c0
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x11/0x20
3 locks held by kworker/u65:3/279:
#0: ffff9c944931e938 ((wq_completion)btrfs-endio){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
#1: ffffabc001edbe10 ((work_completion)(&bbio->end_io_work)){+.+.}-{0:0}, at: process_one_work+0x3b3/0x940
#2: ffff9c9b5e528598 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0xa0
irq event stamp: 5000
hardirqs last enabled at (4999): [<ffffffffb239c79d>] _raw_spin_unlock_irq+0x2d/0x50
hardirqs last disabled at (5000): [<ffffffffb2385fee>] sysvec_apic_timer_interrupt+0xe/0xc0
softirqs last enabled at (4934): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
softirqs last disabled at (4929): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
write to 0xffff9c9b5e21a508 of 8 bytes by interrupt on cpu 0:
__hrtimer_run_queues+0x26f/0x5f0
hrtimer_interrupt+0x1cb/0x3e0
__sysvec_apic_timer_interrupt+0x9d/0x260
sysvec_apic_timer_interrupt+0x80/0xc0
asm_sysvec_apic_timer_interrupt+0x1a/0x20
_raw_spin_unlock_irqrestore+0x34/0x60
___slab_alloc+0xb73/0x1250
kmem_cache_alloc+0x34e/0x390
alloc_iova+0x4c/0x420
alloc_iova_fast+0x2c0/0x4c0
iommu_dma_alloc_iova+0x171/0x1d0
iommu_dma_map_sg+0x469/0x6f0
__dma_map_sg_attrs+0x7d/0x130
dma_map_sgtable+0x4b/0x80
nvme_map_data+0x1b0/0xf60
nvme_queue_rqs+0x176/0x530
__blk_mq_flush_plug_list+0x6b/0x90
blk_mq_flush_plug_list+0xed7/0x1170
blk_add_rq_to_plug+0x1a0/0x350
blk_mq_submit_bio+0x78d/0xe90
__submit_bio+0x38/0xf0
submit_bio_noacct_nocheck+0x535/0x6b0
submit_bio_noacct+0x144/0x710
submit_bio+0x37/0xc0
btrfs_submit_dev_bio+0x247/0x450
__btrfs_submit_bio+0x2a4/0x310
btrfs_submit_chunk+0x25e/0x960
btrfs_submit_bio+0x1f/0x40
submit_one_bio+0x5c/0x90
extent_readahead+0x901/0x930
btrfs_readahead+0x1a/0x30
read_pages+0xdb/0x4a0
page_cache_ra_unbounded+0x215/0x2d0
page_cache_ra_order+0x33f/0x490
filemap_fault+0x9b2/0x15e0
__do_fault+0x78/0x2e0
do_fault+0x4be/0xb50
__handle_mm_fault+0x809/0xb90
handle_mm_fault+0x2b7/0x5d0
exc_page_fault+0x1a0/0xad0
asm_exc_page_fault+0x26/0x30
3 locks held by flatpak/412:
#0: ffff9c944068e980 (mapping.invalidate_lock){.+.+}-{3:3}, at: page_cache_ra_unbounded+0xba/0x2d0
#1: ffffffffb2c83360 (rcu_read_lock){....}-{1:2}, at: blk_mq_flush_plug_list+0xea1/0x1170
#2: ffff9c9b5e21a458 (hrtimer_bases.lock){-.-.}-{2:2}, at: hrtimer_interrupt+0x93/0x3e0
irq event stamp: 82853
hardirqs last enabled at (82852): [<ffffffffb239c810>] _raw_spin_unlock_irqrestore+0x50/0x60
hardirqs last disabled at (82853): [<ffffffffb2385fee>] sysvec_apic_timer_interrupt+0xe/0xc0
softirqs last enabled at (82509): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
softirqs last disabled at (82498): [<ffffffffb116095e>] irq_exit_rcu+0x7e/0xa0
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 412 Comm: flatpak Tainted: G T 6.7.0-rc5-Zen3 #2
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.20 09/27/2023
So of the locks reported by KSCAN ((wq_completion)btrfs-endio)
and ((wq_completion)btrfs-endio-write)
are prominently mentioned in the dmesg.
Full kernel dmesg and kernel v6.7-rc5 .config attached. config_67-rc5_zen3.txt dmesg_67-rc5_zen3.txt
KCSAN throws up several data races when running a btrfs filesystem. For the test run I used a v6.6 kernel built with
KCSAN_EARLY_ENABLE=y
,KCSAN_REPORT_ONCE_IN_MS=12000
,KCSAN_STRICT=y
,KCSAN_WEAK_MEMORY=y
. I booted the machine and did a kernel build. During this period I got 218 data races.I know that a data race detected by KCSAN is not necessarily a bug, but as 64 the 218 data races detected prominently mentioned btrfs, either in the trace or in the workqueue I thought it would probably a good idea to report it here.
4 times I got:
19 times I got:
11 times I got:
10 times I got:
Rather often I got BUG: KCSAN: data-race in __hrtimer_run_queues / hrtimer_active but only some times connected to btrfs, e.g.
Some data on the machine:
Full kernel dmesg and kernel v6.6 .config attached. dmesg_66-van_zen3.txt config_66-van_zen3.txt