Closed Bluekezhou closed 5 years ago
The information you printed is a race condition report from KTSAN. It is indeed something that need to be fixed.
Though this report does not make your kernel fail to boot. If you see a failure then the problem is somewhere else.
Yes, this is a report about some data-race, exactly the thing KTSAN aims to find. So unless this is a false positive, everything is working as intended :)
I am curious about how the code deals such situation like below.
thread 1:
p1 = kmalloc(0x20)
use(p1)
kfree(p1)
thread2:
p2 = kmalloc(0x20)
use(p2)
kfree(p2)
It's possible that those two threads get the same buffer, so will it be taken as a racy use condition?
If we free a buffer in one thread and then the same buffer gets allocated in another, these two threads are synchronized through the locks inside slab allocator. So there should be no false positive report (unless there's a bug in KTSAN of course :).
I compiled and ran the kernel as told by Wiki on ktsan branch, but I got a lot of racy errors. below is one example:
[ OK ] Started Apply Kernel Variables. [ 19.262492] systemd-journald[1028]: /dev/kmsg buffer overrun, some messages lost. Starting Raise network interfaces... [ 19.436711] ================================================================== [ 19.437699] ThreadSanitizer: data-race in deadline_remove_request [ 19.437699] [ 19.438723] Write at 0x000000004f677744 of size 8 by thread 1036 on CPU 3: [ 19.439652] [<00000000cc0d35a4>] deadline_remove_request+0x73/0x180 [ 19.440511] [<000000005f56ced5>] dd_dispatch_request+0x372/0x440 [ 19.441336] [<000000009b55b522>] blk_mq_do_dispatch_sched+0x16e/0x1e0 [ 19.442226] [<000000001dbed43a>] blk_mq_sched_dispatch_requests+0x26a/0x310 [ 19.443156] [<00000000ef95a321>] blk_mq_run_hw_queue+0x7a/0x110 [ 19.443997] [<000000004f52a10a>] blk_mq_delay_run_hw_queue+0x252/0x260 [ 19.444916] [<00000000bf2a6093>] blk_mq_run_hw_queue+0xb8/0x160 [ 19.445708] [<000000003116cc50>] blk_mq_get_tag+0x385/0x530 [ 19.446490] [<000000000832a0a5>] blk_mq_get_request+0x2d1/0x8e0 [ 19.447295] [<00000000584f8d99>] blk_mq_make_request+0x1ac/0x870 [ 19.448262] [<00000000a21bef69>] generic_make_request+0x37a/0x790 [ 19.449157] [<00000000ded495f6>] submit_bio+0xb0/0x210 [ 19.450029] [<000000005c1e03ad>] ext4_mpage_readpages+0x341/0xc70 [ 19.451003] [<000000001d31573e>] ext4_readpages+0x6b/0x90 [ 19.451896] [<0000000088f4e301>] read_pages+0x8c/0x230 [ 19.452704] [<000000005b03acdc>] do_page_cache_readahead+0x1f5/0x270 [ 19.454086] [<00000000d0165eab>] ondemand_readahead+0x276/0x450 [ 19.455031] [<00000000dc812ddf>] page_cache_sync_readahead+0x5b/0x90 [ 19.455972] [<00000000ab164b97>] generic_file_read_iter+0xd19/0x1140 [ 19.456902] [<000000000b12c675>] ext4_file_read_iter+0x89/0xa0 [ 19.457774] [<0000000082eece19>] __vfs_read+0x23a/0x300 [ 19.458549] [<00000000a399db2b>] vfs_read+0xab/0x1d0 [ 19.459301] [<000000008ae39ffa>] ksys_read+0x74/0xf0 [ 19.460046] [<00000000558d45a4>] x64_sys_read+0x43/0x60 [ 19.460848] [<0000000095adb388>] do_syscall_64+0x6f/0x1a0 [ 19.461668] [<000000005349d50d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 19.462656] [ 19.462900] Previous read at 0x000000004f677744 of size 8 by thread 65534 on CPU 1: [ 19.463756] [<00000000a617d697>] dd_has_work+0x7e/0xc0 [ 19.464342] [<00000000c8a8c6da>] blk_mq_run_hw_queue+0x136/0x160 [ 19.465013] [<00000000beea10a9>] blk_mq_run_hw_queues+0x6e/0x90 [ 19.465686] [<000000009c0bceac>] scsi_end_request+0x29b/0x2b0 [ 19.466583] [<000000005c77fd58>] scsi_io_completion+0xd7/0x9d0 [ 19.467445] [<0000000057d1586a>] scsi_finish_command+0x17e/0x1e0 [ 19.468335] [<00000000d07a695f>] scsi_softirq_done+0x1ac/0x1f0 [ 19.469260] [<00000000823edf6f>] blk_done_softirq+0x14b/0x1a0 [ 19.470450] [<000000007e424ad0>] do_softirq+0x110/0x39a [ 19.471602] [<000000009d4c61e5>] call_function_single_interrupt+0xf/0x20 [ 19.472787] [<0000000037e37d37>] arch_cpu_idle+0x21/0x30 [ 19.473449] [<00000000637a3e6a>] do_idle+0x22c/0x320 [ 19.474132] [ 19.474390] Mutexes locked by thread 1036: [ 19.475028] Mutex 145142 is locked here: [ 19.475649] [<00000000c108c427>] _raw_spin_lock+0x3e/0x50 [ 19.476760] [<000000001a65c4ec>] dd_dispatch_request+0x65/0x440 [ 19.477746] [<000000009b55b522>] blk_mq_do_dispatch_sched+0x16e/0x1e0 [ 19.478684] [<000000001dbed43a>] blk_mq_sched_dispatch_requests+0x26a/0x310 [ 19.479695] [<00000000ef95a321>] blk_mq_run_hw_queue+0x7a/0x110 [ 19.480592] [<000000004f52a10a>] blk_mq_delay_run_hw_queue+0x252/0x260 [ 19.481576] [<00000000bf2a6093>] blk_mq_run_hw_queue+0xb8/0x160 [ 19.482458] [<000000003116cc50>] blk_mq_get_tag+0x385/0x530 [ 19.483237] [<000000000832a0a5>] blk_mq_get_request+0x2d1/0x8e0 [ 19.484060] [<00000000584f8d99>] blk_mq_make_request+0x1ac/0x870 [ 19.484890] [<00000000a21bef69>] generic_make_request+0x37a/0x790 [ 19.485740] [<00000000ded495f6>] submit_bio+0xb0/0x210 [ 19.486506] [<000000005c1e03ad>] ext4_mpage_readpages+0x341/0xc70 [ 19.487533] [<000000001d31573e>] ext4_readpages+0x6b/0x90 [ 19.488502] [<0000000088f4e301>] read_pages+0x8c/0x230 [ 19.489340] [<000000005b03acdc>] __do_page_cache_readahead+0x1f5/0x270 [ 19.490126] [<00000000d0165eab>] ondemand_readahead+0x276/0x450 [ 19.490835] [<00000000dc812ddf>] page_cache_sync_readahead+0x5b/0x90 [ 19.491590] [<00000000ab164b97>] generic_file_read_iter+0xd19/0x1140 [ 19.492352] [<000000000b12c675>] ext4_file_read_iter+0x89/0xa0 [ 19.493051] [<0000000082eece19>] vfs_read+0x23a/0x300 [ 19.493678] [<00000000a399db2b>] vfs_read+0xab/0x1d0 [ 19.494306] [<000000008ae39ffa>] ksys_read+0x74/0xf0 [ 19.494915] [<00000000558d45a4>] __x64_sys_read+0x43/0x60 [ 19.495557] [<0000000095adb388>] do_syscall_64+0x6f/0x1a0 [ 19.496202] [<000000005349d50d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Still, there are many other similar mistakes.