Undef LLVM optimizations are harmful

GoogleCodeExporter commented 9 years ago

See https://code.google.com/p/pdfium/issues/detail?id=10 for the repro

msan reports: 
==14288== WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x7f4b558d2087 in PS_Conv_ToFixed third_party/pdfium/core/src/fxge/fx_freetype/src/../fxft2.5.01/src/psaux/psconv.c:275
    #1 0x7f4b558ba581 in ps_parser_load_field third_party/pdfium/core/src/fxge/fx_freetype/src/../fxft2.5.01/src/psaux/psobjs.c:1105
...
  ORIGIN: invalid (0). Might be a bug in MemorySanitizer origin tracking.

Valgrind correctly reports the origin:
==14111== Conditional jump or move depends on uninitialised value(s)
==14111==    at 0x64F41B: PS_Conv_ToFixed (psconv.c:275)
==14111==    by 0x64A6B8: ps_parser_load_field (psobjs.c:1109)
==14111==    by 0x67AB9F: t1_load_keyword (t1load.c:1016)
==14111==    by 0x67CBA3: parse_dict (t1load.c:2001)

==14111==  Uninitialised value was created by a stack allocation
==14111==    at 0x64F242: PS_Conv_ToFixed (psconv.c:195)

Original issue reported on code.google.com by konstant...@gmail.com on 4 Jun 2014 at 1:59

GoogleCodeExporter commented 9 years ago

Something extremely weird is going on here.
I've got no proof, but it seems possible that, if we have two computations,
shadow(x) and origin(x), and both eventually depend on a compile-time *undef* 
value, to end up with inconsistent (i.e. "impossible") result. For example, if 
the original computation had two paths (merging into a phi node), we might take 
shadow from one patch, and origin from the other.

Original comment by euge...@google.com on 9 Jun 2014 at 1:28

GoogleCodeExporter commented 9 years ago

It's actually much simpler than that.
We've got a variable that's *undef* from the start, and we've got two branches, 
one of which may initialize it, but not the other one. When the two branches 
merge, we get

  %_msphi_s380 = phi i64 [ -1, %82 ], [ %_msprop_select630, %if.end27 ]
  %_msphi_o381 = phi i32 [ 0, %82 ], [ %_msphi_o628, %if.end27 ]
  %integral.0 = phi i64 [ undef, %82 ], [ %call23.shl, %if.end27 ]

From the top: shadow, origin, user value. Compile-time *undef* was propagated 
(before MSan) to the left argument of the PHI. Of course, origin was lost.

If neither of 2 branches could initialize this variable, undef would have been 
propagated further, and we would have lost the MSan report altogether.

So, this is part of the bigger problem: LLVM transformations can turn *undef* 
into random defined value (sometimes with a compiler warning).

Perhaps we could do something about it in Clang, or early in LLVM pipeline. 
Ex., assign all uninitialized stack variables (and maybe all *undef* that are 
emitted by clang) some magic value _and_ poison their shadow.

Original comment by euge...@google.com on 25 Jun 2014 at 12:56

Changed title: Undef LLVM optimizations are harmful

staticanalysis / memory-sanitizer

Undef LLVM optimizations are harmful #55