llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.86k forks source link

Polybench mm benchmarks with Polly and Wasm backend lead to runtime error #49757

Open 07bcbce5-af49-4a2a-83e4-483b07325df6 opened 3 years ago

07bcbce5-af49-4a2a-83e4-483b07325df6 commented 3 years ago
Bugzilla Link 50413
Version trunk
OS Linux
CC @dschuff,@Meinersbur,@ManuelSelva

Extended Description

I am compiling the polybench to WebAssembly using clang. Activating Polly in this compilation path lead to a runtime error when executing the generated wasm code for the 2mm, 3mm and gemm benchmarks.

Here is the error, basically the code tries to access memory out of what has been allocated :

╰─ wasmtime 2mm-wasm_polly 
Error: failed to run main module `2mm-wasm-polly`

Caused by:
    0: failed to invoke command default
    1: wasm trap: out of bounds memory access
       wasm backtrace:
           0:  0x3ec - <unknown>!polybench_alloc_data
           1:  0x491 - <unknown>!main
           2: 0x456a - <unknown>!__main_void
           3: 0x44e2 - <unknown>!__original_main
           4:  0x21a - <unknown>!_start
       note: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information

Note that the error occurs with all the WebAssembly runtimes that we tried.

Maybe the issue is not on the Polly side, but in the WebAssembly backend one. I didn't found a way to attach this request also to another product so that the people from the backend could be notified too. Please let me know how to do that.

Finally, here are some information to reproduce the bug. Please let me know if I can provide anything else that maybe useful for you to identify what is going on.

llvm commit : 1fbb484ea45f85740b7450b175096e5fcff6ecd9

compilation command (include and link options omitted) : clang -O3 -mllvm -polly -2mm.c -DLARGE_DATASET -DPOLYBENCH_TIME --target=wasm32-wasi -o 2mm-wasm-polly

Thank you for the support,

-- Manu

dschuff commented 3 years ago

Given that you are trapping in polybench_alloc_data, are you sure that you are allowing memory to be allocated in the right way? I tried this out using emscripten and node.js.

I'm not too familiar with wasi's clang and wasmtime, but in emscripten you need to ensure you allocate a large enough wasm memory size (in emscripten the default is 16M IIRC, which is not enough for this example). If you set -s INITIAL_MEMORY=256MB or -s ALLOW_MEMORY_GROWTH=1 it does work, both with and without polly.

With bare clang, you can set the initial and max memory with the linker's --initial-memory and --max-memory flags (e.g. -Wl,--initial-memory=16777216 -Wl,--max-memory=2147483648 ) and you can set the stack size with the -z stack-size flag. Probably adding -v to your clang command line will show you what it uses by default.

Meinersbur commented 3 years ago

I already tried to get wasm to compile and run but requires too many other things (backend, libc replacement, javascript, wasi, ...) and even then I still have no idea else I could do than you.

The minimal code looks like the gemm optimization would trigger this, where I just fixed a bug: llvm/llvm-bugzilla-archive#50557 Also try switching off that optimization (-mllvm -polly-pattern-matching-based-opts=0).

Does wasm limit the stack size? If yes, that might be the cause because above optimization allocates temporary arrays on the stack.

More information could also be helpful. What is the error with the reduced code (since there is no polybench_alloc_data)? What is the output with WASMTIME_BACKTRACE_DETAILS=1? -mllvm -debug-only=polly-ast? -mllvm -polly-codegen-add-debug-printing? -mllvm -polly-codegen-trace-scalars -mllvm -polly-codegen-trace-stmts? Can you reduce the matrix size as well? Can you reproduce it with the legacy pass manager? -polly-position=early? ....

07bcbce5-af49-4a2a-83e4-483b07325df6 commented 3 years ago

Hi Michael,

I investigated more on this issue, and I came with the following "minimal" example that reproduce the problem :

#include <stdlib.h>

#define ni 800
#define nj 900
#define nk 1100

int main() {

  double (*A)[nk] = malloc(sizeof(double[ni][nk]));
  double (*B)[nj] = malloc(sizeof(double[nk][nj]));
  double (*D)[nj] = malloc(sizeof(double[ni][nj]));

  int i, j, k;
  for (i = 0; i < ni; i++) {
    for (j = 0; j < nj; j++){
      for (k = 0; k < nk; ++k)
        D[i][j] += A[i][k] * B[k][j];
    }
  }

  return 0;
}

I am on LLVM 1fbb484ea45f, and compiling this example (mm.c file) with the following line leads to a wasm file that I can execute :

clang-13 -O2 mm.c --sysroot wasi-sysroot --target=wasm32-wasi -o mm-wasm

Nevertheless, when I add Polly, the resulting wasm file leads to a runtime error.

clang-13 -O2 -mllvm -polly mm.c --sysroot wasi-sysroot --target=wasm32-wasi -o mm-polly-wasm

Also I have not been able to get the IR file associated to each one of these 2 versions.

Do you mind to try reproducing the issue ?

Thank,

-- Manu

Meinersbur commented 3 years ago

Support for -polly-dump-before with the NPM has recently been added: https://github.com/llvm/llvm-project/commit/29bef8e4e3593ab37c4d3b6289dcdec961c3fb52 Unfortunately, because of how extension points work in the NPM, it it only possible with -polly-position=early. An alternative is NPM's -print-before option (https://reviews.llvm.org/D87216). However, it only prints to dbgs() and only the function (not the entire module), both making getting a reproducer difficult.

07bcbce5-af49-4a2a-83e4-483b07325df6 commented 3 years ago

I tried to compile&run 2mm as WebAssembly, but stopped after this taking too much time.

Judging from the backtrace, this doesn't seem to be an issue in Polly. The crash occurs in polybench_alloc_data, which is implemented in polybench.c. It doesn't even contain a loop, hence not a subject of optimizations by Polly.

Additional info that could help:

  • Output files of -polly-dump-before/-polly-dump-after
  • Output of -mllvm -debug-only=polly-detect,polly-scops,polly-opt-isl,polly-ast
  • Selectively optimizing specific functions, e.g. -polly-only-func=init_array, --polly-only-func=kernel_2mm or -polly-only-func=polybench_alloc_data

Hi Michael,

I am investigating more on this issue, and I am not able to run some polly passes because of the following kind of errors :

error in backend: Option -polly-dump-before not supported with NPM

I am running Polly directly from the clang driver, using -mllvm to specify polly passes. I tried to set the old pass manager, but this changes the behavior of the programs outputed by Polly. Do you confirm that I should stay with the new pass manager ? If yes, how can I make passes such as polly-dump-before work with the new pass manage ?

Thank you again for your help.

Meinersbur commented 3 years ago

You can change the Product to "libraries" and Component to "Backend: Webassembly". Doesn't hurt to add some developers + the Backend's code owner the CC list.

Alternatively, create a new bug entry for "Backend: Webassembly". When the faulty component is identified, the other bug entry can be set as a duplicate of the other one.

Try to reduce the reproducer. E.g. but everything into one file, remove functions/statements, generate LLVM-IR before or after optimizations.

07bcbce5-af49-4a2a-83e4-483b07325df6 commented 3 years ago

Hi Michael,

Thank you for your answer. As you suggest, I don't think the bug is on the Polly side but more on the Wasm back-end one.

I'll provide tomorrow the additional information you suggested.

Also, how should I modify this bug entry so that Wasm backend people are aware of it ?

-- Manu

Meinersbur commented 3 years ago

I tried to compile&run 2mm as WebAssembly, but stopped after this taking too much time.

Judging from the backtrace, this doesn't seem to be an issue in Polly. The crash occurs in polybench_alloc_data, which is implemented in polybench.c. It doesn't even contain a loop, hence not a subject of optimizations by Polly.

Additional info that could help: