llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.54k stars 11.8k forks source link

[WebAssembly] zero-length memcpy does not result in no-op when bulk-memory is enabled #63755

Open Luukdegram opened 1 year ago

Luukdegram commented 1 year ago

According to the LLVM Language Reference:

If <len> is 0, it is no-op modulo the behavior of attributes attached to the arguments. If <len> is not a well-defined value, the behavior is undefined. If <len> is not zero, both <dest> and <src> should be well-defined, otherwise the behavior is undefined.

performing a memory copy with a zero-length results in a no-op module. This means I'd expect for the WebAssembly target with the bulk-memory feature enabled, to result in a no-op when the length is 0. However, this seems to emit a memory.copy instruction with no checks, meaning it can result in a trap when the destination address is out-of-bounds.

i.e. given the following IR:

; Function Attrs: noredzone nounwind
define dso_local void @_start() #0 !dbg !256 {
Entry:
  %0 = alloca { ptr, i32 }, align 4
  %1 = alloca { ptr, i32 }, align 4
  %2 = call fastcc { ptr, i32 } @foo.foo(), !dbg !265
  store { ptr, i32 } %2, ptr %1, align 4, !dbg !265
  call void @llvm.dbg.declare(metadata ptr %1, metadata !262, metadata !DIExpression()), !dbg !265
  %3 = call fastcc { ptr, i32 } @foo.foo(), !dbg !266
  store { ptr, i32 } %3, ptr %0, align 4, !dbg !266
  call void @llvm.dbg.declare(metadata ptr %0, metadata !264, metadata !DIExpression()), !dbg !266
  %4 = load { ptr, i32 }, ptr %0, align 4, !dbg !267
  %5 = load { ptr, i32 }, ptr %1, align 4, !dbg !267
  %6 = extractvalue { ptr, i32 } %5, 0, !dbg !267
  %7 = extractvalue { ptr, i32 } %4, 1, !dbg !267
  %8 = extractvalue { ptr, i32 } %4, 0, !dbg !267
  call void @llvm.memcpy.p0.p0.i32(ptr align 1 %8, ptr align 1 %6, i32 %7, i1 false), !dbg !267
  ret void, !dbg !267
}

; Function Attrs: noredzone nounwind
define internal fastcc { ptr, i32 } @foo.foo() unnamed_addr #0 !dbg !268 {
Entry:
  %0 = alloca i32, align 4
  store i32 -1, ptr %0, align 4, !dbg !274
  call void @llvm.dbg.declare(metadata ptr %0, metadata !272, metadata !DIExpression()), !dbg !274
  ret { ptr, i32 } { ptr inttoptr (i32 -1 to ptr), i32 0 }, !dbg !275
}

Will emit a memory.copy instruction although the length of the operand is 0. In this case, the destination address is out-of-bounds, therefore resulting in a trap. According to the WebAssembly specification any out-of-bounds memory operation will result in a trap, regardless of the fact we have a zero-length operand. This is not the behavior I'd expect according to the LLVM Language Reference. This behavior only occurs when the bulk-memory feature is enabled as without this feature, we don't have access to the above mentioned instruction and LLVM will emit a loop instead (Which is safe for zero-length copies).

This issue also occurs for the llvm.memset intrinsic.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-webassembly

tlively commented 1 year ago

Hmm, I guess a proper fix here would require lowering the LLVM intrinsic to a length check + bulk memory instruction instead of just a bulk memory instruction. That seems unfortunate.

Alternative fixes would be to relax either the LLVM or WebAssembly semantics. I filed https://github.com/WebAssembly/design/issues/1482 to discuss this on the WebAssembly side, but given the high cost of changing the WebAssembly spec, it might be easier to change the LLVM semantics to allow trapping on OOB accesses even if the length is zero.

Luukdegram commented 1 year ago

Thank you for filing the ticket on the WebAssembly side, I'll follow the discussion closely.

tlively commented 1 year ago

We decided not to change the WebAssembly semantics, so I will take a look at fixing this in the code unless someone chimes in who is interested in trying to get the LLVM semantics relaxed.

nikic commented 1 year ago

Changing the LLVM semantics is not feasible. You will have to emit the check in the wasm backend.

dwblaikie commented 1 year ago

Isn't the same true for the standard memcpy? So this webassembly situation could probably reuse however we handle ::memcpy on *nix targets?

tlively commented 1 year ago

@dwblaikie, can you elaborate? Are you talking about target OSes when compiling C++ std::memcpy? Where is the handling you're referring to?

dwblaikie commented 1 year ago

Hmm, can't quite reproduce/tickle what I thought I was. But I thought standard memcpy (std::memcpy or global memcpy in C) was UB if the input or output is invalid, even if len == 0 (something like that, but maybe I'm misremembering - but I'm guessing the WebAssembly choice came from this choice in standard memcpy).

So I would've thought that LLVM would already have some code for non-WebAssembly targets to lower memcpy LLVM intrinsic to conditional memcpy call, and WA could use the same thing...

tlively commented 1 year ago

Yes, cppreference at least does explicitly say an OOB read of length zero is UB. But maybe backends that emit memcpy calls are piercing that abstraction and emitting memcpy calls directly anyway, knowing that nothing bad will happen?

dwblaikie commented 1 year ago

But maybe backends that emit memcpy calls are piercing that abstraction and emitting memcpy calls directly anyway, knowing that nothing bad will happen?

Perhaps, on platforms it knows that it'll be OK? Maybe there are platforms it doesn't know about the memcpy implementation details it'll still do the right thing and add a conditional for safety?

dschuff commented 1 week ago

@sbc100 @aheejin we discussed today what we'd need to do to push some of the new features on-by-default. This is one we should probably fix before enabling bulk memory.

aheejin commented 1 week ago

@dschuff What are we supposed to fix here? If it's UB, doesn't that mean the current status quo is fine? Or, because the behavior is different from when bulk memory is disabled, it could be confusing, so you'd like it to be a no-op?

nikic commented 1 week ago

@aheejin llvm.memcpy(a, b, len) needs to be lowered to if (len != 0) { memory.copy(a, b, len); } if len cannot be proven to be non-zero. It is not UB in LLVM IR (and also C2y, but this isn't really relevant as far as the Wasm backend is concerned).