Open dvyukov opened 2 years ago
I think a workable solution would be to use spare origin bits to indicate which of the four bytes the origin applies to. And if an uninit write overwrites all the bytes to which the old origin applies, we just discard it and take the new one. @eugenis, do you think we can sacrifice 4 bits of the origin id in the userspace?
Will it help in this case? Don't we always overwrite the origin? We could.
Assume we use the 4 lowest bits of the origin for this, so a newly generated origin is always stack_id << 4 & 0xf
.
For every WRITE(aligned_addr, offset, size, new_origin)
(with size: 1..4
in the case of unaligned writes), we generate a bitmask of size
1s starting at offset
, e.g. 1100b
for memcpy(tmp, &a, sizeof(a));
, 0011b
and 1100b
for memcpy(tmp, &b, sizeof(b));
etc.
Then we handle the write as follows:
WRITE(aligned_addr, offset, size, new_stack):
old_origin = ORIGIN(aligned_addr)
old_stack = old_origin >> 4
old_mask = old_origin % 16
new_stack = new_origin >> 4
new_mask = MASK(aligned_addr, offset, size)
if old_stack == new_stack:
UPDATE_ORIGIN(aligned_addr, old_origin | new_mask)
else:
if old_mask == new_mask:
UPDATE_ORIGIN(aligned_addr, (new_stack << 16) | new_mask)
else:
UPDATE_ORIGIN(aligned_addr, (new_stack << 16) | (new_mask & ~old_mask)))
This way we'll never have a situation in which an origin does not actually correspond to any of the four uninitialized bytes.
For some reason I assumed what happens here is that we mishandle unaligned split writes (writes that cross 2 4-byte granules) and simply don't update the second granule at all. You are saying we correctly handle split writes as 2 separate writes? But if so, could we simply always store the new origin? It should also ensure that origin always covers at least some of the uninit bytes.
Hm, no, looks like both of us are missing something.
You are right that we do not generate two origin writes for a single unaligned store: https://godbolt.org/z/MWqc7hGhx
On the other hand, at least with KMSAN your repro seems to be working correctly:
BUG: KMSAN: uninit-value in mmm+0xe0/0x120 [kmsan_test]
mmm+0xe0/0x120 mm/kmsan/kmsan_test.c:379
test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:389
kunit_run_case_internal lib/kunit/test.c:333
kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
kthread+0x721/0x850 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 ??:?
Uninit was stored to memory at:
foobar+0x199/0x1f0 mm/kmsan/kmsan_test.c:366
barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:372
mmm+0x9e/0x120 mm/kmsan/kmsan_test.c:379
test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:389
kunit_run_case_internal lib/kunit/test.c:333
kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
kthread+0x721/0x850 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 ??:?
Local variable c.addr created at:
foobar+0x9c/0x1f0 kmsan_test.c:?
barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:372
CPU: 1 PID: 6425 Comm: kunit_try_catch Tainted: G E 5.16.0-rc5+ #156
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Note that, according to MSan source:
/// Every 4 aligned, consecutive bytes of application memory have one origin
/// value associated with them. If these bytes contain uninitialized data
/// coming from 2 different allocations, the last store wins. Because of this,
/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
/// practice.
, i.e. we are always storing the origin for an uninitialized value (but only in the first origin slot, if there are two).
How does it work for KMSAN? Doesn't this logic happen in the instrumentation?
It does, and I don't see how tmp
can remain in the origins, as the first memcpy to bytes 0-1 touches the first origin slot corresponding to bytes 0-3 of tmp
, and the third memcpy to bytes 6-7 touches the second one (corresponding to bytes 4-7).
The following code, however, indeed triggers a problem:
memcpy(tmp, &a, sizeof(a));
memcpy(&tmp[4], &c, sizeof(c));
memcpy(&tmp[2], &b, sizeof(b));
After the third memcpy the value of c
is erased from tmp
, but the second origin slot still contains the origin of c
, because the second memcpy only touches the first origin slot:
=====================================================
BUG: KMSAN: uninit-value in mmm+0xe0/0x120 [kmsan_test]
mmm+0xe0/0x120 mm/kmsan/kmsan_test.c:384
test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:394
kunit_run_case_internal lib/kunit/test.c:333
kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
kthread+0x721/0x850 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 ??:?
Uninit was stored to memory at:
foobar+0x15f/0x1f0 mm/kmsan/kmsan_test.c:365
barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:377
mmm+0x9e/0x120 mm/kmsan/kmsan_test.c:384
test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:394
kunit_run_case_internal lib/kunit/test.c:333
kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
kthread+0x721/0x850 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 ??:?
Local variable c.addr created at:
foobar+0x9c/0x1f0 kmsan_test.c:?
barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:377
CPU: 2 PID: 6431 Comm: kunit_try_catch Tainted: G E 5.16.0-rc5+ #156
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
=====================================================
It does, and I don't see how tmp can remain in the origins, as the first memcpy to bytes 0-1 touches the first origin slot corresponding to bytes 0-3 of tmp, and the third memcpy to bytes 6-7 touches the second one (corresponding to bytes 4-7).
As a matter of fact, tmp remains in the origins. This fact is the essence of this bug report. It must not remain there, but it does. I posted msan report for this reproducer in the issue description. Doesn't it print the same for you?
The following code, however, indeed triggers a problem:
c is initialized, so if it's the origin of uninit, that's much worse problem.
Or did not you initialized it in your reproducer? If so, that's indeed somewhat similar (but also subtly different).
And I think for both cases handling unaligned split writes as 2 writes to adjacent origin slots would solve the problem, right?
c is initialized, so if it's the origin of uninit, that's much worse problem.
This happens because we do not reset the origin for a memory location when writing initialized data to it.
Most of the time it is okay, but if we make it poisoned again without updating the origin (like memcpy(&tmp[2], &b, sizeof(b));
in my example does), we'll be using the stale origin.
Or did not you initialized it in your reproducer? If so, that's indeed somewhat similar (but also subtly different).
No, I took your code and only modified foobar().
And I think for both cases handling unaligned split writes as 2 writes to adjacent origin slots would solve the problem, right?
For the code in question the problem can be solved by changing memcpy() to handle unaligned stores, but looks like MSan performs similarly for a plain unaligned store, like in the following example:
int *p = (int*)&tmp[2];
*p = b;
Too bad we don't have conditional stores in the IR, so handling unaligned stores would require a branch for every origin store :(
This could have been offloaded to the runtime though by emitting a call to some __msan_paint_origin(ptr, size, origin)
.
c is initialized, so if it's the origin of uninit, that's much worse problem.
This happens because we do not reset the origin for a memory location when writing initialized data to it.
I am not following. Why did we write an origin of an initialized variable anywhere in the first place?
Sorry, I think I've introduced more confusion by the KMSAN example.
c.addr
in https://github.com/llvm/llvm-project/issues/52961#issuecomment-1011227897 is not pointing to the variable c
in barfoo()
but rather to the address-taken temp created in foobar()
to store the contents of c
.
That temp is poisoned at creation time and unpoisoned when c
is written to it.
Now I am totally lost what code we are discussing and what problem we are trying to solve.
My understanding: both MSAN and KMSAN equally mis-handle the code in issue description. The simplest solution: handle mis-aligned split writes as 2 separate origin writes. It should be possible to do it efficiently since the compiler must know when things can be unaligned (not sure if kernel always sticks to alignment rules, but at least for memcpy compiler should know it). This will also fix some other cases. E. g. I assume currently the following will fail to print any origin:
char data[8] = {};
int uninit;
memcpy(data + 2, &uninit, sizeof(uninit));
if (data[4]) ...
Ok, sorry, I think what matters now is if/how do we want to handle the unaligned accesses. You are right that in many cases we have the information about the store alignment (it is preserved for typed pointers, although casts to void or char drop the alignment). For memcpy, we don't really need that information, as we're calling the runtime anyway.
I'll look into handling unaligned accesses, although it might not be the biggest issue with KMSAN at the moment (and MSan users do not seem to care much).
For memcpy, we don't really need that information, as we're calling the runtime anyway.
I thought maybe the compiler transforms 4-byte memcpy's into plain (unaligned) stores. Or do we disable this optimization under KMSAN?
I agree that adding a check for all loads and stores will be expensive. But there are some memcpy's and packed structs, where compiler knows things are unaligned.
I'll look into handling unaligned accesses, although it might not be the biggest issue with KMSAN at the moment
I agree it's not the biggest. But fwiw I think I tried to debug around a dozen of KMSAN reports where origin did not make sense to me but otherwise the report looked consistent and legit. Other users may have much lower will to debug reports that don't make sense to them and just tag KMSAN as "this tool always produces nonsense I am not looking at its reports anymore".
I thought maybe the compiler transforms 4-byte memcpy's into plain (unaligned) stores. Or do we disable this optimization under KMSAN?
Normally this is represented as a call to the llvm.memcpy
intrinsic, which [K]MSan replaces with a call to __msan_memcpy()
(after which it can't be turned into a store anymore).
I agree that adding a check for all loads and stores will be expensive. But there are some memcpy's and packed structs, where compiler knows things are unaligned.
I can start with fixing __msan_memcpy()
, perhaps that'll already handle a most of the errors.
I agree it's not the biggest. But fwiw I think I tried to debug around a dozen of KMSAN reports where origin did not make sense to me but otherwise the report looked consistent and legit. Other users may have much lower will to debug reports that don't make sense to them and just tag KMSAN as "this tool always produces nonsense I am not looking at its reports anymore".
Agreed.
Normally this is represented as a call to the llvm.memcpy intrinsic, which [K]MSan replaces with a call to __msan_memcpy() (after which it can't be turned into a store anymore).
For foobar in the reproducer I see:
00000000004994f0 <foobar>:
4994f0: 50 push %rax
4994f1: 48 8b 05 00 cb 02 00 mov 0x2cb00(%rip),%rax # 4c5ff8 <__msan_param_tls@@Base+0x4c5ff8>
4994f8: 64 48 83 38 00 cmpq $0x0,%fs:(%rax)
4994fd: 75 46 jne 499545 <foobar+0x55>
4994ff: 64 44 0f b7 40 18 movzwl %fs:0x18(%rax),%r8d
499505: 64 44 8b 48 10 mov %fs:0x10(%rax),%r9d
49950a: 64 44 0f b7 58 08 movzwl %fs:0x8(%rax),%r11d
499510: 49 ba 00 00 00 00 00 movabs $0x500000000000,%r10
499517: 50 00 00
49951a: 48 89 f8 mov %rdi,%rax
49951d: 4c 31 d0 xor %r10,%rax
499520: 66 44 89 18 mov %r11w,(%rax)
499524: 66 89 37 mov %si,(%rdi)
499527: 48 8d 47 02 lea 0x2(%rdi),%rax
49952b: 4c 31 d0 xor %r10,%rax
49952e: 44 89 08 mov %r9d,(%rax)
499531: 89 57 02 mov %edx,0x2(%rdi)
499534: 48 8d 47 06 lea 0x6(%rdi),%rax
499538: 4c 31 d0 xor %r10,%rax
49953b: 66 44 89 00 mov %r8w,(%rax)
49953f: 66 89 4f 06 mov %cx,0x6(%rdi)
499543: 58 pop %rax
499544: c3 ret
499545: 31 ff xor %edi,%edi
499547: e8 c4 6a f8 ff call 420010 <__msan_warning_with_origin_noreturn>
And for -fsanitize=kernel-memory:
0000000000401140 <foobar>:
401140: 55 push %rbp
401141: 41 57 push %r15
401143: 41 56 push %r14
401145: 41 55 push %r13
401147: 41 54 push %r12
401149: 53 push %rbx
40114a: 48 83 ec 28 sub $0x28,%rsp
40114e: 89 4c 24 24 mov %ecx,0x24(%rsp)
401152: 89 54 24 20 mov %edx,0x20(%rsp)
401156: 41 89 f6 mov %esi,%r14d
401159: 48 89 fb mov %rdi,%rbx
40115c: e8 ef 03 00 00 call 401550 <__msan_get_context_state>
401161: 0f b7 48 18 movzwl 0x18(%rax),%ecx
401165: 66 89 4c 24 12 mov %cx,0x12(%rsp)
40116a: 8b 88 a0 0c 00 00 mov 0xca0(%rax),%ecx
401170: 89 4c 24 1c mov %ecx,0x1c(%rsp)
401174: 44 8b 78 10 mov 0x10(%rax),%r15d
401178: 8b 88 98 0c 00 00 mov 0xc98(%rax),%ecx
40117e: 89 4c 24 18 mov %ecx,0x18(%rsp)
401182: 0f b7 68 08 movzwl 0x8(%rax),%ebp
401186: 8b 88 90 0c 00 00 mov 0xc90(%rax),%ecx
40118c: 89 4c 24 14 mov %ecx,0x14(%rsp)
401190: 4c 8b 20 mov (%rax),%r12
401193: 8b 80 88 0c 00 00 mov 0xc88(%rax),%eax
401199: 89 44 24 0c mov %eax,0xc(%rsp)
40119d: 4d 85 e4 test %r12,%r12
4011a0: 75 69 jne 40120b <foobar+0xcb>
4011a2: 48 89 df mov %rbx,%rdi
4011a5: e8 b6 03 00 00 call 401560 <__msan_metadata_ptr_for_store_2>
4011aa: 66 89 28 mov %bp,(%rax)
4011ad: 66 85 ed test %bp,%bp
4011b0: 75 64 jne 401216 <foobar+0xd6>
4011b2: 48 8d 6b 02 lea 0x2(%rbx),%rbp
4011b6: 66 44 89 33 mov %r14w,(%rbx)
4011ba: 4d 85 e4 test %r12,%r12
4011bd: 75 69 jne 401228 <foobar+0xe8>
4011bf: 48 89 ef mov %rbp,%rdi
4011c2: e8 a9 03 00 00 call 401570 <__msan_metadata_ptr_for_store_4>
4011c7: 44 89 38 mov %r15d,(%rax)
4011ca: 45 85 ff test %r15d,%r15d
4011cd: 75 64 jne 401233 <foobar+0xf3>
4011cf: 48 8d 6b 06 lea 0x6(%rbx),%rbp
4011d3: 8b 44 24 20 mov 0x20(%rsp),%eax
4011d7: 89 43 02 mov %eax,0x2(%rbx)
4011da: 4d 85 e4 test %r12,%r12
4011dd: 75 65 jne 401244 <foobar+0x104>
4011df: 48 89 ef mov %rbp,%rdi
4011e2: e8 79 03 00 00 call 401560 <__msan_metadata_ptr_for_store_2>
4011e7: 0f b7 4c 24 12 movzwl 0x12(%rsp),%ecx
4011ec: 66 89 08 mov %cx,(%rax)
4011ef: 66 85 c9 test %cx,%cx
4011f2: 75 5b jne 40124f <foobar+0x10f>
4011f4: 8b 44 24 24 mov 0x24(%rsp),%eax
4011f8: 66 89 45 00 mov %ax,0x0(%rbp)
4011fc: 48 83 c4 28 add $0x28,%rsp
401200: 5b pop %rbx
401201: 41 5c pop %r12
401203: 41 5d pop %r13
401205: 41 5e pop %r14
401207: 41 5f pop %r15
401209: 5d pop %rbp
40120a: c3 ret
40120b: 8b 7c 24 0c mov 0xc(%rsp),%edi
40120f: e8 6c 03 00 00 call 401580 <__msan_warning>
401214: eb 8c jmp 4011a2 <foobar+0x62>
401216: 49 89 d5 mov %rdx,%r13
401219: 8b 7c 24 14 mov 0x14(%rsp),%edi
40121d: e8 6e 03 00 00 call 401590 <__msan_chain_origin>
401222: 41 89 45 00 mov %eax,0x0(%r13)
401226: eb 8a jmp 4011b2 <foobar+0x72>
401228: 8b 7c 24 0c mov 0xc(%rsp),%edi
40122c: e8 4f 03 00 00 call 401580 <__msan_warning>
401231: eb 8c jmp 4011bf <foobar+0x7f>
401233: 48 89 d5 mov %rdx,%rbp
401236: 8b 7c 24 18 mov 0x18(%rsp),%edi
40123a: e8 51 03 00 00 call 401590 <__msan_chain_origin>
40123f: 89 45 00 mov %eax,0x0(%rbp)
401242: eb 8b jmp 4011cf <foobar+0x8f>
401244: 8b 7c 24 0c mov 0xc(%rsp),%edi
401248: e8 33 03 00 00 call 401580 <__msan_warning>
40124d: eb 90 jmp 4011df <foobar+0x9f>
40124f: 48 89 d3 mov %rdx,%rbx
401252: 8b 7c 24 1c mov 0x1c(%rsp),%edi
401256: e8 35 03 00 00 call 401590 <__msan_chain_origin>
40125b: 89 03 mov %eax,(%rbx)
40125d: eb 95 jmp 4011f4 <foobar+0xb4>
no calls to memcpy...
Of course the biggest problem is that we're potentially mapping 4n bytes of src
origin to 4(n+1) of dst
, so if more than 4 bytes are copied we'll have to replicate one of the origins twice, and sometimes that can be the wrong origin.
no calls to memcpy...
Yeah, you are right. According to godbolt, this transformation only happens for bigger buffer sizes (16+ and surprisingly 15 as well).
Reproducer:
produces:
This is wrong, there are no uninit bits that come from
tmp
. All oftmp
was overwritten by other data.This is extracted from a KMSAN false positive (and I think I've seen multiple similar ones).
Also see #36554 and #36486 which are also about incorrect origins.
@ramosian-glider @vitalybuka @eugenis