halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.84k stars 1.07k forks source link

[wasm] WebAssembly CodeGen disables PIC #7796

Open derek-gerstmann opened 1 year ago

derek-gerstmann commented 1 year ago

It appears that the WebAssembly CodeGen explicitly disables PIC, which means that objects and modules being generated for wasm cannot be linked into a shared object and dynamically loaded (eg emcc -s SIDE_MODULE=2).

Is this necessary?

b292ab5dae (Steven Johnson 2019-03-15 11:33:32 -0700 364) bool CodeGen_WebAssembly::use_pic() const {
b292ab5dae (Steven Johnson 2019-03-15 11:33:32 -0700 365)     return false;
b292ab5dae (Steven Johnson 2019-03-15 11:33:32 -0700 366) }
steven-johnson commented 1 year ago

...I don't remember :-)

Try changing it and see what breaks?

derek-gerstmann commented 1 year ago

Okay, will do ... lemme see if things go boom.

derek-gerstmann commented 1 year ago

Just enabling the flag doesn't seem to be enough to enable relocation ... the LLVM module gets created with the correct halide_use_pic flag, and PIC level flags, but I still get errors emitted from wasm-ld:

wasm-ld: error: relocation R_WASM_MEMORY_ADDR_SLEB cannot be used against symbol.Lstr; recompile with -fPIC

There may be some other LLVM feature flags that may need to get set to make this work. I'll keep investigating.

derek-gerstmann commented 1 year ago

This seems relevant: https://bugs.llvm.org/show_bug.cgi?id=42714

Which would imply that the triple flag needs to have the os set to emscripten to allow dynamic linking.

sbc100 commented 1 year ago

I believe that bug was fixed in https://reviews.llvm.org/D153293. I just closes the corresponding github issue: https://github.com/llvm/llvm-project/issues/42059

sbc100 commented 1 year ago

Note that if you want you could unconditionally enable PIC code generation. It does at a little bloat to the object files, due to the extra indirection, but when linked into a static binary 100% of this can be removed by wasm-opt in release builds.

steven-johnson commented 1 year ago

Note that if you want you could unconditionally enable PIC code generation. It does at a little bloat to the object files, due to the extra indirection, but when linked into a static binary 100% of this can be removed by wasm-opt in release builds.

That sounds like the simplest option, TBH -- is there a reason this isn't done in general? (Is wasm-opt not usually in use, or is the bloat more than "a little"?)

(Thanks for commenting, BTW, didn't realize you were watching these issues/PRs :-)

sbc100 commented 1 year ago

I was pointed here by an emscripten who uses halide

sbc100 commented 1 year ago

Maybe you can just remove the bool use_pic() const override; override and take the PIC setting from the command line?

slomp commented 1 year ago

Here's an idea: could we just always set PIC everywhere, and then have a Halide target flag (e.g., -no_pic) to disable it?

steven-johnson commented 1 year ago

PIC setting from the command line?

It not quite that simple, alas, since Halide codegen isn't always (or even usually) directly invoked from the command line, but rather, indirectly from build rules.

Here's an idea: could we just always set PIC everywhere, and then have a Halide target flag (e.g., -no_pic) to disable it?

That would probably be a much more tractable approach. It looks like we (currently) default to using PIC for all LLVM-based backends, with no way to override them; WebAssembly is the only backend that changes this (to false, also with no way to override it.)

So, strawman:

derek-gerstmann commented 1 year ago

Thanks very much for the reply! It appears just setting use_pic() isn’t enough … we still get linking errors when trying to use the static libraries or object files create by Halide in a shared object created by emcc.

In the link you provided it appears that the target triple includes wasm32-wasi … is wasi required for dynamic linking?

steven-johnson commented 1 year ago

we still get linking errors

OK, we should clearly hold off until we know the entire scope of the issue. What are the linking errors?

steven-johnson commented 1 year ago

I believe that bug was fixed in https://reviews.llvm.org/D153293. I just closes the corresponding github issue: llvm/llvm-project#42059

A bit of a threadjack here, but: it's been a minute since I've done more than minor fixes to our wasm backend... which was written when WASI was still mostly just a proposal. Looking at the link, it seems likely that upgrading our codegen to be WASI-compliant (if needed) would be highly desirable (to allow for easier use in a runtime env that doesn't include JS support)... and I don't know anything at all about the Component Model proposal.

Unfortunately, our team is fairly understaffed right now, and there's ~no way that I'm likely to get any bandwidth to look into this anytime soon (unless my manager(s) make it a priority for me), but the Halide team would absolutely welcome any community contribution that could be made here.

derek-gerstmann commented 1 year ago

we still get linking errors

OK, we should clearly hold off until we know the entire scope of the issue. What are the linking errors?

wasm-ld: error: relocation R_WASM_MEMORY_ADDR_SLEB cannot be used against symbol.Lstr; recompile with -fPIC

sbc100 commented 1 year ago

@steven-johnson. Note that both wasi and the component model as still very early in the standard process. Its likely worth supporting the different target triples but I think that is all you would need to do. I would imagine you could even target wasm32-unknown-unknown and you object files should work with both wasi-sdk and emscripten (I think there was one ABI difference remaining, but there are efforts to remove that too).

derek-gerstmann commented 1 year ago

I attempted to override the llvm::triple inside of LLVM_Runtime_Linker.cpp:

https://github.com/halide/Halide/blob/acc9413084db07fd6c794e1d59b6eca0f72b140e/src/LLVM_Runtime_Linker.cpp#L492

I tried wasm32-unknown-unkown and wasm32-unknown-wasi, but this didn't seem to help. Is there a different format necessary for wasm static libraries or object files that's needed?

slomp commented 1 year ago

FWIW, I just stumbled upon this (from 2019): https://iandouglasscott.com/2019/07/18/experimenting-with-webassembly-dynamic-linking-with-clang/

sbc100 commented 1 year ago

I attempted to override the llvm::triple inside of LLVM_Runtime_Linker.cpp:

https://github.com/halide/Halide/blob/acc9413084db07fd6c794e1d59b6eca0f72b140e/src/LLVM_Runtime_Linker.cpp#L492

I tried wasm32-unknown-unkown and wasm32-unknown-wasi, but this didn't seem to help. Is there a different format necessary for wasm static libraries or object files that's needed?

Not a different format, no. But the relocation types used are different. In PIC code generatation you should shouldn't get any R_WASM_MEMORY_ADDR_SLEB relocations being generated. Hence the error message telling you to recompile with -fPIC. So I guess that halide compiler is doing something a little different to the clang here. I don't think it will be hard for someone with some knowledge of hadlide to track down and fix.

derek-gerstmann commented 1 year ago

Okay, good to know. It looks like the generated object files can be inspected with wabt. From the HelloWasm app:

> wasm-objdump -x ./bin/wasm/reaction_diffusion_init.o

reaction_diffusion_init.o:  file format wasm 0x1

Section Details:

Type[6]:
 - type[0] (i32) -> i32
 - type[1] (i32, i32) -> i32
 - type[2] (i32, i32, i32, i32) -> i32
 - type[3] (i32, i32, i32, i32, i32) -> i32
 - type[4] (i32, i32, i64, i64) -> i32
 - type[5] () -> i32
Import[11]:
 - memory[0] pages: initial=1 <- env.__linear_memory
 - global[0] i32 mutable=1 <- env.__stack_pointer
 - func[0] sig=1 <env.halide_error_buffer_argument_is_null> <- env.halide_error_buffer_argument_is_null
 - func[1] sig=2 <env.halide_error_bad_type> <- env.halide_error_bad_type
 - func[2] sig=2 <env.halide_error_bad_dimensions> <- env.halide_error_bad_dimensions
 - func[3] sig=2 <env.halide_error_buffer_extents_negative> <- env.halide_error_buffer_extents_negative
 - func[4] sig=3 <env.halide_error_constraint_violated> <- env.halide_error_constraint_violated
 - func[5] sig=4 <env.halide_error_buffer_allocation_too_large> <- env.halide_error_buffer_allocation_too_large
 - func[6] sig=4 <env.halide_error_buffer_extents_too_large> <- env.halide_error_buffer_extents_too_large
 - func[7] sig=1 <env.halide_error_device_dirty_with_no_device_support> <- env.halide_error_device_dirty_with_no_device_support
 - func[8] sig=1 <env.halide_error_host_is_null> <- env.halide_error_host_is_null
Function[3]:
 - func[9] sig=0 <reaction_diffusion_init>
 - func[10] sig=0 <reaction_diffusion_init_argv>
 - func[11] sig=5 <reaction_diffusion_init_metadata>
DataCount:
 - data count: 9
Code[3]:
 - func[9] size=1697 <reaction_diffusion_init>
 - func[10] size=13 <reaction_diffusion_init_argv>
 - func[11] size=8 <reaction_diffusion_init_metadata>
Data[9]:
 - segment[0] <.rodata..L__unnamed_1> memory=0 size=24 - init i32=0
  - 0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000010: 0000 0000 0000 0000                      ........
 - segment[1] <.rodata..Lstr> memory=0 size=7 - init i32=32
  - 0000020: 6f75 7470 7574 00                        output.
 - segment[2] <.rodata..L__unnamed_2> memory=0 size=36 - init i32=48
  - 0000030: 2000 0000 0200 0000 0300 0000 0220 0100   ............ ..
  - 0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000050: 0000 0000                                ....
 - segment[3] <.rodata..Lstr.4> memory=0 size=26 - init i32=96
  - 0000060: 7761 736d 2d33 322d 7761 736d 7274 2d6e  wasm-32-wasmrt-n
  - 0000070: 6f5f 7275 6e74 696d 6500                 o_runtime.
 - segment[4] <.rodata..Lstr.5> memory=0 size=24 - init i32=128
  - 0000080: 7265 6163 7469 6f6e 5f64 6966 6675 7369  reaction_diffusi
  - 0000090: 6f6e 5f69 6e69 7400                      on_init.
 - segment[5] <.rodata..Lreaction_diffusion_init_metadata_storage> memory=0 size=20 - init i32=160
  - 00000a0: 0100 0000 0100 0000 3000 0000 6000 0000  ........0...`...
  - 00000b0: 8000 0000                                ....
 - segment[6] <.rodata..Lstr.6> memory=0 size=21 - init i32=192
  - 00000c0: 4f75 7470 7574 2062 7566 6665 7220 6f75  Output buffer ou
  - 00000d0: 7470 7574 00                             tput.
 - segment[7] <.rodata..Lstr.7> memory=0 size=16 - init i32=224
  - 00000e0: 6f75 7470 7574 2e73 7472 6964 652e 3000  output.stride.0.
 - segment[8] <.rodata..Lstr.8> memory=0 size=2 - init i32=256
  - 0000100: 3100                                     1.
Custom:
 - name: "linking"
  - symbol table [count=22]
   - 0: F <reaction_diffusion_init> func=9 [ binding=global vis=default ]
   - 1: G <env.__stack_pointer> global=0 [ undefined binding=global vis=default ]
   - 2: D <.Lstr> segment=1 offset=0 size=7 [ binding=local vis=default ]
   - 3: F <env.halide_error_buffer_argument_is_null> func=0 [ undefined binding=global vis=default ]
   - 4: D <.Lstr.6> segment=6 offset=0 size=21 [ binding=local vis=default ]
   - 5: F <env.halide_error_bad_type> func=1 [ undefined binding=global vis=default ]
   - 6: F <env.halide_error_bad_dimensions> func=2 [ undefined binding=global vis=default ]
   - 7: F <env.halide_error_buffer_extents_negative> func=3 [ undefined binding=global vis=default ]
   - 8: D <.Lstr.7> segment=7 offset=0 size=16 [ binding=local vis=default ]
   - 9: D <.Lstr.8> segment=8 offset=0 size=2 [ binding=local vis=default ]
   - 10: F <env.halide_error_constraint_violated> func=4 [ undefined binding=global vis=default ]
   - 11: F <env.halide_error_buffer_allocation_too_large> func=5 [ undefined binding=global vis=default ]
   - 12: F <env.halide_error_buffer_extents_too_large> func=6 [ undefined binding=global vis=default ]
   - 13: F <env.halide_error_device_dirty_with_no_device_support> func=7 [ undefined binding=global vis=default ]
   - 14: F <env.halide_error_host_is_null> func=8 [ undefined binding=global vis=default ]
   - 15: F <reaction_diffusion_init_argv> func=10 [ binding=global vis=default ]
   - 16: F <reaction_diffusion_init_metadata> func=11 [ binding=global vis=default ]
   - 17: D <.Lreaction_diffusion_init_metadata_storage> segment=5 offset=0 size=20 [ binding=local vis=default ]
   - 18: D <.L__unnamed_1> segment=0 offset=0 size=24 [ binding=local vis=default ]
   - 19: D <.L__unnamed_2> segment=2 offset=0 size=36 [ binding=local vis=default ]
   - 20: D <.Lstr.4> segment=3 offset=0 size=26 [ binding=local vis=default ]
   - 21: D <.Lstr.5> segment=4 offset=0 size=24 [ binding=local vis=default ]
  - segment info [count=9]
   - 0: .rodata..L__unnamed_1 p2align=4 [ ]
   - 1: .rodata..Lstr p2align=5 [ ]
   - 2: .rodata..L__unnamed_2 p2align=4 [ ]
   - 3: .rodata..Lstr.4 p2align=5 [ ]
   - 4: .rodata..Lstr.5 p2align=5 [ ]
   - 5: .rodata..Lreaction_diffusion_init_metadata_storage p2align=4 [ ]
   - 6: .rodata..Lstr.6 p2align=5 [ ]
   - 7: .rodata..Lstr.7 p2align=5 [ ]
   - 8: .rodata..Lstr.8 p2align=5 [ ]
Custom:
 - name: "reloc.CODE"
  - relocations for section: 4 (Code) [34]
   - R_WASM_GLOBAL_INDEX_LEB offset=0x00000f(file=0x0001ff) symbol=1 <env.__stack_pointer>
   - R_WASM_GLOBAL_INDEX_LEB offset=0x00001a(file=0x00020a) symbol=1 <env.__stack_pointer>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x00002a(file=0x00021a) symbol=2 <.Lstr>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000030(file=0x000220) symbol=3 <env.halide_error_buffer_argument_is_null>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x000298(file=0x000488) symbol=4 <.Lstr.6>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0002a4(file=0x000494) symbol=5 <env.halide_error_bad_type>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x0002b1(file=0x0004a1) symbol=4 <.Lstr.6>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0002bb(file=0x0004ab) symbol=6 <env.halide_error_bad_dimensions>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x0002c8(file=0x0004b8) symbol=4 <.Lstr.6>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0002d2(file=0x0004c2) symbol=7 <env.halide_error_buffer_extents_negative>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x0002df(file=0x0004cf) symbol=4 <.Lstr.6>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0002e9(file=0x0004d9) symbol=7 <env.halide_error_buffer_extents_negative>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x0002f6(file=0x0004e6) symbol=4 <.Lstr.6>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000300(file=0x0004f0) symbol=7 <env.halide_error_buffer_extents_negative>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x00030d(file=0x0004fd) symbol=8 <.Lstr.7>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x000315(file=0x000505) symbol=9 <.Lstr.8>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x00031d(file=0x00050d) symbol=10 <env.halide_error_constraint_violated>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x00032a(file=0x00051a) symbol=2 <.Lstr>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000338(file=0x000528) symbol=11 <env.halide_error_buffer_allocation_too_large>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x000345(file=0x000535) symbol=2 <.Lstr>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000353(file=0x000543) symbol=11 <env.halide_error_buffer_allocation_too_large>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x000360(file=0x000550) symbol=2 <.Lstr>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x00036e(file=0x00055e) symbol=12 <env.halide_error_buffer_extents_too_large>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x00037b(file=0x00056b) symbol=2 <.Lstr>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000389(file=0x000579) symbol=11 <env.halide_error_buffer_allocation_too_large>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x000396(file=0x000586) symbol=2 <.Lstr>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0003a4(file=0x000594) symbol=12 <env.halide_error_buffer_extents_too_large>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x0003b1(file=0x0005a1) symbol=4 <.Lstr.6>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0003b7(file=0x0005a7) symbol=13 <env.halide_error_device_dirty_with_no_device_support>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x0003c4(file=0x0005b4) symbol=4 <.Lstr.6>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0003ca(file=0x0005ba) symbol=14 <env.halide_error_host_is_null>
   - R_WASM_GLOBAL_INDEX_LEB offset=0x00069c(file=0x00088c) symbol=1 <env.__stack_pointer>
   - R_WASM_FUNCTION_INDEX_LEB offset=0x0006ac(file=0x00089c) symbol=0 <reaction_diffusion_init>
   - R_WASM_MEMORY_ADDR_SLEB offset=0x0006b5(file=0x0008a5) symbol=17 <.Lreaction_diffusion_init_metadata_storage>
Custom:
 - name: "reloc.DATA"
  - relocations for section: 5 (Data) [5]
   - R_WASM_MEMORY_ADDR_I32 offset=0x00002f(file=0x0008e0) symbol=2 <.Lstr>
   - R_WASM_MEMORY_ADDR_I32 offset=0x00004f(file=0x000900) symbol=18 <.L__unnamed_1>
   - R_WASM_MEMORY_ADDR_I32 offset=0x00009f(file=0x000950) symbol=19 <.L__unnamed_2>
   - R_WASM_MEMORY_ADDR_I32 offset=0x0000a3(file=0x000954) symbol=20 <.Lstr.4>
   - R_WASM_MEMORY_ADDR_I32 offset=0x0000a7(file=0x000958) symbol=21 <.Lstr.5>
Custom:
 - name: "producers"
derek-gerstmann commented 1 year ago

As @sbc100 indicated, the above object file contains R_WASM_MEMORY_ADDR_SLEB entries which shouldn't exist if PIC was enabled. I've scrutinized the WebAssembly clang driver and noticed that "mutable-globals" appear to be required for the linker:

https://github.com/llvm/llvm-project/blob/dfc759929644ed1ea3224ab30e1086f7acc60da6/clang/lib/Driver/ToolChains/WebAssembly.cpp#L276

So I added this to Halide's CodeGen_WebAssembly::mattrs() so that emits "+mutable-globals" and verified this shows up in the generated object files:

Custom:
 - name: "target_features"
  - [+] mutable-globals

But the linker still complains and we still have R_WASM_MEMORY_ADDR_SLEB symbols being generated.

I'll keep digging ....

derek-gerstmann commented 1 year ago

I haven't found anything concrete ... my guess is we may be missing one or more LLVM passes that may transform the module into a PIC compatible layout for wasm and/or are missing attributes on global values or functions (ie dso_local) that would allow them to get stored correctly.

We don't use the standard llvm PassBuilder::buildModuleOptimizationPipeline() optimizations so things like RelLookupTableConverterPass don't get run.

Anyone have any other ideas? @steven-johnson @abadams @sbc100

sbc100 commented 1 year ago

The relevant code is in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp.

You can see that LowerGlobalAddress produces different code if isPositionIndependent is true. You might want to check that isPositionIndependent is returning true here?

sbc100 commented 1 year ago

@derek-gerstmann, also are you using llvm tip-of-tree? Because https://github.com/llvm/llvm-project/issues/42059 only recently landed.

derek-gerstmann commented 1 year ago

Ah, thanks! So far I've been testing with LLVM releases 15-17. I'll pull llvm from main and rebuild.

derek-gerstmann commented 1 year ago

Okay, fix confirmed. Testing against LLVM main (18.x), with the following changes to Halide enabled me to produce wasm object files that could be relocated with emcc:

@@ -362,7 +370,7 @@ bool CodeGen_WebAssembly::use_soft_float_abi() const {
 }

 bool CodeGen_WebAssembly::use_pic() const {
-    return false;
+    return true;
 }
+++ b/src/CodeGen_WebAssembly.cpp
@@ -344,6 +344,14 @@ string CodeGen_WebAssembly::mattrs() const {
         sep = ",";
     }

+    // PIC implies +mutable-globals because the PIC ABI used by the linker
+    // depends on importing and exporting mutable globals. Also -pthread implies
+    // mutable-globals too, so quitely enable it if either of these are specified.
+    if(use_pic() || target.has_feature(Target::WasmThreads)) {
+        s << sep << "+mutable-globals";
+        sep = ",";
+    }
+

Optional, but likely needed (based on comments from LLVM devs):

@@ -1151,6 +1156,16 @@ void CodeGen_LLVM::optimize_module() {
     using OptimizationLevel = llvm::OptimizationLevel;
     OptimizationLevel level = OptimizationLevel::O3;

+#if LLVM_VERSION >= 180
+    if(tm->isPositionIndependent()) {
+        pb.registerOptimizerLastEPCallback(
+            [&](ModulePassManager &mpm, OptimizationLevel level) {
+                mpm.addPass(RelLookupTableConverterPass());        
+            });
+    }
+#endif
+

Symbols in the wasm object files now appear as R_WASM_MEMORY_ADDR_REL_SLEB rather than absolute offsets via R_WASM_MEMORY_ADDR_SLEB.

I'll open a PR, and we can iron out the details of the proposed strawman. Thanks everyone! @slomp @sbc100 @steven-johnson