llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.3k stars 11.68k forks source link

msan: incorrect origin due to unaligned writes #52961

Open dvyukov opened 2 years ago

dvyukov commented 2 years ago

Reproducer:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

__attribute__((noinline)) void foobar(char* tmp, short a, int b, short c) {
  memcpy(tmp, &a, sizeof(a));
  tmp += sizeof(a);
  memcpy(tmp, &b, sizeof(b));
  tmp += sizeof(b);
  memcpy(tmp, &c, sizeof(c));
}

__attribute__((noinline)) char barfoo(short a, int b, int x) {
  char tmp[8];
  short c = 0;
  foobar(tmp, a, b, c);
  return tmp[x] == 42;
}

int main(int argc, char** argv) {
  volatile short a;
  volatile int b;
  if (barfoo(a, b, argc > 1 ? atoi(argv[1]) : 0))
    printf("bingo!");
}

produces:

$ clang test.c -O2 -fsanitize=memory -fsanitize-memory-track-origins -g && ./a.out 4
==1560283==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x4a4ad6 in main test.c:23:7

  Uninitialized value was created by an allocation of 'tmp' in the stack frame of function 'barfoo'
    #0 0x4a4760 in barfoo test.c:13

This is wrong, there are no uninit bits that come from tmp. All of tmp was overwritten by other data.

This is extracted from a KMSAN false positive (and I think I've seen multiple similar ones).

Also see #36554 and #36486 which are also about incorrect origins.

@ramosian-glider @vitalybuka @eugenis

ramosian-glider commented 2 years ago

I think a workable solution would be to use spare origin bits to indicate which of the four bytes the origin applies to. And if an uninit write overwrites all the bytes to which the old origin applies, we just discard it and take the new one. @eugenis, do you think we can sacrifice 4 bits of the origin id in the userspace?

dvyukov commented 2 years ago

Will it help in this case? Don't we always overwrite the origin? We could.

ramosian-glider commented 2 years ago

Assume we use the 4 lowest bits of the origin for this, so a newly generated origin is always stack_id << 4 & 0xf. For every WRITE(aligned_addr, offset, size, new_origin) (with size: 1..4 in the case of unaligned writes), we generate a bitmask of size 1s starting at offset, e.g. 1100b for memcpy(tmp, &a, sizeof(a));, 0011b and 1100b for memcpy(tmp, &b, sizeof(b)); etc.

Then we handle the write as follows:

WRITE(aligned_addr, offset, size, new_stack):
  old_origin = ORIGIN(aligned_addr)
  old_stack = old_origin >> 4
  old_mask = old_origin % 16
  new_stack = new_origin >> 4
  new_mask = MASK(aligned_addr, offset, size)
  if old_stack == new_stack:
    UPDATE_ORIGIN(aligned_addr, old_origin | new_mask)
  else:
    if old_mask == new_mask:
      UPDATE_ORIGIN(aligned_addr, (new_stack << 16) | new_mask)
    else:
      UPDATE_ORIGIN(aligned_addr, (new_stack << 16) | (new_mask & ~old_mask)))

This way we'll never have a situation in which an origin does not actually correspond to any of the four uninitialized bytes.

dvyukov commented 2 years ago

For some reason I assumed what happens here is that we mishandle unaligned split writes (writes that cross 2 4-byte granules) and simply don't update the second granule at all. You are saying we correctly handle split writes as 2 separate writes? But if so, could we simply always store the new origin? It should also ensure that origin always covers at least some of the uninit bytes.

ramosian-glider commented 2 years ago

Hm, no, looks like both of us are missing something.

You are right that we do not generate two origin writes for a single unaligned store: https://godbolt.org/z/MWqc7hGhx

On the other hand, at least with KMSAN your repro seems to be working correctly:

BUG: KMSAN: uninit-value in mmm+0xe0/0x120 [kmsan_test]
 mmm+0xe0/0x120 mm/kmsan/kmsan_test.c:379
 test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:389
 kunit_run_case_internal lib/kunit/test.c:333
 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
 kthread+0x721/0x850 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 ??:? 

Uninit was stored to memory at:
 foobar+0x199/0x1f0 mm/kmsan/kmsan_test.c:366
 barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:372
 mmm+0x9e/0x120 mm/kmsan/kmsan_test.c:379
 test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:389
 kunit_run_case_internal lib/kunit/test.c:333
 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
 kthread+0x721/0x850 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 ??:? 

Local variable c.addr created at:
 foobar+0x9c/0x1f0 kmsan_test.c:?
 barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:372

CPU: 1 PID: 6425 Comm: kunit_try_catch Tainted: G            E     5.16.0-rc5+ #156 
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014

Note that, according to MSan source:

/// Every 4 aligned, consecutive bytes of application memory have one origin
/// value associated with them. If these bytes contain uninitialized data
/// coming from 2 different allocations, the last store wins. Because of this,
/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
/// practice.

, i.e. we are always storing the origin for an uninitialized value (but only in the first origin slot, if there are two).

dvyukov commented 2 years ago

How does it work for KMSAN? Doesn't this logic happen in the instrumentation?

ramosian-glider commented 2 years ago

It does, and I don't see how tmp can remain in the origins, as the first memcpy to bytes 0-1 touches the first origin slot corresponding to bytes 0-3 of tmp, and the third memcpy to bytes 6-7 touches the second one (corresponding to bytes 4-7).

The following code, however, indeed triggers a problem:

  memcpy(tmp, &a, sizeof(a));
  memcpy(&tmp[4], &c, sizeof(c));
  memcpy(&tmp[2], &b, sizeof(b));

After the third memcpy the value of c is erased from tmp, but the second origin slot still contains the origin of c, because the second memcpy only touches the first origin slot:

=====================================================
BUG: KMSAN: uninit-value in mmm+0xe0/0x120 [kmsan_test]
 mmm+0xe0/0x120 mm/kmsan/kmsan_test.c:384
 test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:394
 kunit_run_case_internal lib/kunit/test.c:333
 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
 kthread+0x721/0x850 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 ??:?

Uninit was stored to memory at:
 foobar+0x15f/0x1f0 mm/kmsan/kmsan_test.c:365
 barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:377
 mmm+0x9e/0x120 mm/kmsan/kmsan_test.c:384
 test_incorrect+0x14d/0x5e0 mm/kmsan/kmsan_test.c:394
 kunit_run_case_internal lib/kunit/test.c:333
 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
 kthread+0x721/0x850 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 ??:?

Local variable c.addr created at:
 foobar+0x9c/0x1f0 kmsan_test.c:?
 barfoo+0x84/0xf0 mm/kmsan/kmsan_test.c:377

CPU: 2 PID: 6431 Comm: kunit_try_catch Tainted: G            E     5.16.0-rc5+ #156
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
=====================================================
dvyukov commented 2 years ago

It does, and I don't see how tmp can remain in the origins, as the first memcpy to bytes 0-1 touches the first origin slot corresponding to bytes 0-3 of tmp, and the third memcpy to bytes 6-7 touches the second one (corresponding to bytes 4-7).

As a matter of fact, tmp remains in the origins. This fact is the essence of this bug report. It must not remain there, but it does. I posted msan report for this reproducer in the issue description. Doesn't it print the same for you?

dvyukov commented 2 years ago

The following code, however, indeed triggers a problem:

c is initialized, so if it's the origin of uninit, that's much worse problem.

Or did not you initialized it in your reproducer? If so, that's indeed somewhat similar (but also subtly different).

And I think for both cases handling unaligned split writes as 2 writes to adjacent origin slots would solve the problem, right?

ramosian-glider commented 2 years ago

c is initialized, so if it's the origin of uninit, that's much worse problem.

This happens because we do not reset the origin for a memory location when writing initialized data to it. Most of the time it is okay, but if we make it poisoned again without updating the origin (like memcpy(&tmp[2], &b, sizeof(b)); in my example does), we'll be using the stale origin.

Or did not you initialized it in your reproducer? If so, that's indeed somewhat similar (but also subtly different).

No, I took your code and only modified foobar().

And I think for both cases handling unaligned split writes as 2 writes to adjacent origin slots would solve the problem, right?

For the code in question the problem can be solved by changing memcpy() to handle unaligned stores, but looks like MSan performs similarly for a plain unaligned store, like in the following example:

int *p = (int*)&tmp[2];
*p = b;

Too bad we don't have conditional stores in the IR, so handling unaligned stores would require a branch for every origin store :( This could have been offloaded to the runtime though by emitting a call to some __msan_paint_origin(ptr, size, origin).

dvyukov commented 2 years ago

c is initialized, so if it's the origin of uninit, that's much worse problem.

This happens because we do not reset the origin for a memory location when writing initialized data to it.

I am not following. Why did we write an origin of an initialized variable anywhere in the first place?

ramosian-glider commented 2 years ago

Sorry, I think I've introduced more confusion by the KMSAN example.

c.addr in https://github.com/llvm/llvm-project/issues/52961#issuecomment-1011227897 is not pointing to the variable c in barfoo() but rather to the address-taken temp created in foobar() to store the contents of c. That temp is poisoned at creation time and unpoisoned when c is written to it.

dvyukov commented 2 years ago

Now I am totally lost what code we are discussing and what problem we are trying to solve.

My understanding: both MSAN and KMSAN equally mis-handle the code in issue description. The simplest solution: handle mis-aligned split writes as 2 separate origin writes. It should be possible to do it efficiently since the compiler must know when things can be unaligned (not sure if kernel always sticks to alignment rules, but at least for memcpy compiler should know it). This will also fix some other cases. E. g. I assume currently the following will fail to print any origin:

char data[8] = {};
int uninit;
memcpy(data + 2, &uninit, sizeof(uninit));
if (data[4]) ...
ramosian-glider commented 2 years ago

Ok, sorry, I think what matters now is if/how do we want to handle the unaligned accesses. You are right that in many cases we have the information about the store alignment (it is preserved for typed pointers, although casts to void or char drop the alignment). For memcpy, we don't really need that information, as we're calling the runtime anyway.

I'll look into handling unaligned accesses, although it might not be the biggest issue with KMSAN at the moment (and MSan users do not seem to care much).

dvyukov commented 2 years ago

For memcpy, we don't really need that information, as we're calling the runtime anyway.

I thought maybe the compiler transforms 4-byte memcpy's into plain (unaligned) stores. Or do we disable this optimization under KMSAN?

I agree that adding a check for all loads and stores will be expensive. But there are some memcpy's and packed structs, where compiler knows things are unaligned.

I'll look into handling unaligned accesses, although it might not be the biggest issue with KMSAN at the moment

I agree it's not the biggest. But fwiw I think I tried to debug around a dozen of KMSAN reports where origin did not make sense to me but otherwise the report looked consistent and legit. Other users may have much lower will to debug reports that don't make sense to them and just tag KMSAN as "this tool always produces nonsense I am not looking at its reports anymore".

ramosian-glider commented 2 years ago

I thought maybe the compiler transforms 4-byte memcpy's into plain (unaligned) stores. Or do we disable this optimization under KMSAN?

Normally this is represented as a call to the llvm.memcpy intrinsic, which [K]MSan replaces with a call to __msan_memcpy() (after which it can't be turned into a store anymore).

I agree that adding a check for all loads and stores will be expensive. But there are some memcpy's and packed structs, where compiler knows things are unaligned.

I can start with fixing __msan_memcpy(), perhaps that'll already handle a most of the errors.

I agree it's not the biggest. But fwiw I think I tried to debug around a dozen of KMSAN reports where origin did not make sense to me but otherwise the report looked consistent and legit. Other users may have much lower will to debug reports that don't make sense to them and just tag KMSAN as "this tool always produces nonsense I am not looking at its reports anymore".

Agreed.

dvyukov commented 2 years ago

Normally this is represented as a call to the llvm.memcpy intrinsic, which [K]MSan replaces with a call to __msan_memcpy() (after which it can't be turned into a store anymore).

For foobar in the reproducer I see:

00000000004994f0 <foobar>:
  4994f0:   50                      push   %rax
  4994f1:   48 8b 05 00 cb 02 00    mov    0x2cb00(%rip),%rax        # 4c5ff8 <__msan_param_tls@@Base+0x4c5ff8>
  4994f8:   64 48 83 38 00          cmpq   $0x0,%fs:(%rax)
  4994fd:   75 46                   jne    499545 <foobar+0x55>
  4994ff:   64 44 0f b7 40 18       movzwl %fs:0x18(%rax),%r8d
  499505:   64 44 8b 48 10          mov    %fs:0x10(%rax),%r9d
  49950a:   64 44 0f b7 58 08       movzwl %fs:0x8(%rax),%r11d
  499510:   49 ba 00 00 00 00 00    movabs $0x500000000000,%r10
  499517:   50 00 00 
  49951a:   48 89 f8                mov    %rdi,%rax
  49951d:   4c 31 d0                xor    %r10,%rax
  499520:   66 44 89 18             mov    %r11w,(%rax)
  499524:   66 89 37                mov    %si,(%rdi)
  499527:   48 8d 47 02             lea    0x2(%rdi),%rax
  49952b:   4c 31 d0                xor    %r10,%rax
  49952e:   44 89 08                mov    %r9d,(%rax)
  499531:   89 57 02                mov    %edx,0x2(%rdi)
  499534:   48 8d 47 06             lea    0x6(%rdi),%rax
  499538:   4c 31 d0                xor    %r10,%rax
  49953b:   66 44 89 00             mov    %r8w,(%rax)
  49953f:   66 89 4f 06             mov    %cx,0x6(%rdi)
  499543:   58                      pop    %rax
  499544:   c3                      ret    
  499545:   31 ff                   xor    %edi,%edi
  499547:   e8 c4 6a f8 ff          call   420010 <__msan_warning_with_origin_noreturn>

And for -fsanitize=kernel-memory:

0000000000401140 <foobar>:
  401140:   55                      push   %rbp
  401141:   41 57                   push   %r15
  401143:   41 56                   push   %r14
  401145:   41 55                   push   %r13
  401147:   41 54                   push   %r12
  401149:   53                      push   %rbx
  40114a:   48 83 ec 28             sub    $0x28,%rsp
  40114e:   89 4c 24 24             mov    %ecx,0x24(%rsp)
  401152:   89 54 24 20             mov    %edx,0x20(%rsp)
  401156:   41 89 f6                mov    %esi,%r14d
  401159:   48 89 fb                mov    %rdi,%rbx
  40115c:   e8 ef 03 00 00          call   401550 <__msan_get_context_state>
  401161:   0f b7 48 18             movzwl 0x18(%rax),%ecx
  401165:   66 89 4c 24 12          mov    %cx,0x12(%rsp)
  40116a:   8b 88 a0 0c 00 00       mov    0xca0(%rax),%ecx
  401170:   89 4c 24 1c             mov    %ecx,0x1c(%rsp)
  401174:   44 8b 78 10             mov    0x10(%rax),%r15d
  401178:   8b 88 98 0c 00 00       mov    0xc98(%rax),%ecx
  40117e:   89 4c 24 18             mov    %ecx,0x18(%rsp)
  401182:   0f b7 68 08             movzwl 0x8(%rax),%ebp
  401186:   8b 88 90 0c 00 00       mov    0xc90(%rax),%ecx
  40118c:   89 4c 24 14             mov    %ecx,0x14(%rsp)
  401190:   4c 8b 20                mov    (%rax),%r12
  401193:   8b 80 88 0c 00 00       mov    0xc88(%rax),%eax
  401199:   89 44 24 0c             mov    %eax,0xc(%rsp)
  40119d:   4d 85 e4                test   %r12,%r12
  4011a0:   75 69                   jne    40120b <foobar+0xcb>
  4011a2:   48 89 df                mov    %rbx,%rdi
  4011a5:   e8 b6 03 00 00          call   401560 <__msan_metadata_ptr_for_store_2>
  4011aa:   66 89 28                mov    %bp,(%rax)
  4011ad:   66 85 ed                test   %bp,%bp
  4011b0:   75 64                   jne    401216 <foobar+0xd6>
  4011b2:   48 8d 6b 02             lea    0x2(%rbx),%rbp
  4011b6:   66 44 89 33             mov    %r14w,(%rbx)
  4011ba:   4d 85 e4                test   %r12,%r12
  4011bd:   75 69                   jne    401228 <foobar+0xe8>
  4011bf:   48 89 ef                mov    %rbp,%rdi
  4011c2:   e8 a9 03 00 00          call   401570 <__msan_metadata_ptr_for_store_4>
  4011c7:   44 89 38                mov    %r15d,(%rax)
  4011ca:   45 85 ff                test   %r15d,%r15d
  4011cd:   75 64                   jne    401233 <foobar+0xf3>
  4011cf:   48 8d 6b 06             lea    0x6(%rbx),%rbp
  4011d3:   8b 44 24 20             mov    0x20(%rsp),%eax
  4011d7:   89 43 02                mov    %eax,0x2(%rbx)
  4011da:   4d 85 e4                test   %r12,%r12
  4011dd:   75 65                   jne    401244 <foobar+0x104>
  4011df:   48 89 ef                mov    %rbp,%rdi
  4011e2:   e8 79 03 00 00          call   401560 <__msan_metadata_ptr_for_store_2>
  4011e7:   0f b7 4c 24 12          movzwl 0x12(%rsp),%ecx
  4011ec:   66 89 08                mov    %cx,(%rax)
  4011ef:   66 85 c9                test   %cx,%cx
  4011f2:   75 5b                   jne    40124f <foobar+0x10f>
  4011f4:   8b 44 24 24             mov    0x24(%rsp),%eax
  4011f8:   66 89 45 00             mov    %ax,0x0(%rbp)
  4011fc:   48 83 c4 28             add    $0x28,%rsp
  401200:   5b                      pop    %rbx
  401201:   41 5c                   pop    %r12
  401203:   41 5d                   pop    %r13
  401205:   41 5e                   pop    %r14
  401207:   41 5f                   pop    %r15
  401209:   5d                      pop    %rbp
  40120a:   c3                      ret    
  40120b:   8b 7c 24 0c             mov    0xc(%rsp),%edi
  40120f:   e8 6c 03 00 00          call   401580 <__msan_warning>
  401214:   eb 8c                   jmp    4011a2 <foobar+0x62>
  401216:   49 89 d5                mov    %rdx,%r13
  401219:   8b 7c 24 14             mov    0x14(%rsp),%edi
  40121d:   e8 6e 03 00 00          call   401590 <__msan_chain_origin>
  401222:   41 89 45 00             mov    %eax,0x0(%r13)
  401226:   eb 8a                   jmp    4011b2 <foobar+0x72>
  401228:   8b 7c 24 0c             mov    0xc(%rsp),%edi
  40122c:   e8 4f 03 00 00          call   401580 <__msan_warning>
  401231:   eb 8c                   jmp    4011bf <foobar+0x7f>
  401233:   48 89 d5                mov    %rdx,%rbp
  401236:   8b 7c 24 18             mov    0x18(%rsp),%edi
  40123a:   e8 51 03 00 00          call   401590 <__msan_chain_origin>
  40123f:   89 45 00                mov    %eax,0x0(%rbp)
  401242:   eb 8b                   jmp    4011cf <foobar+0x8f>
  401244:   8b 7c 24 0c             mov    0xc(%rsp),%edi
  401248:   e8 33 03 00 00          call   401580 <__msan_warning>
  40124d:   eb 90                   jmp    4011df <foobar+0x9f>
  40124f:   48 89 d3                mov    %rdx,%rbx
  401252:   8b 7c 24 1c             mov    0x1c(%rsp),%edi
  401256:   e8 35 03 00 00          call   401590 <__msan_chain_origin>
  40125b:   89 03                   mov    %eax,(%rbx)
  40125d:   eb 95                   jmp    4011f4 <foobar+0xb4>

no calls to memcpy...

ramosian-glider commented 2 years ago

Of course the biggest problem is that we're potentially mapping 4n bytes of src origin to 4(n+1) of dst, so if more than 4 bytes are copied we'll have to replicate one of the origins twice, and sometimes that can be the wrong origin.

ramosian-glider commented 2 years ago

no calls to memcpy...

Yeah, you are right. According to godbolt, this transformation only happens for bigger buffer sizes (16+ and surprisingly 15 as well).