[BYO LLVM] dynamic module does not define moudle export function (PyInit_runtime) error

Peter9606 commented 7 months ago

Hi,

I tried to bring my own llvm using iree/build_tools/llvm/byo_llvm.sh, and completed build (install) without any errors. After that when I tried to run a SHARK-Turbine aot example, I got following error.

(iree) ➜  aot_mlp git:(main) ✗ python3 mlp_export_simple.py
Traceback (most recent call last):
  File "/home/peter/github/SHARK-Turbine/examples/aot_mlp/mlp_export_simple.py", line 12, in <module>
    import shark_turbine.aot as aot
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/__init__.py", line 7, in <module>
    from .compiled_module import CompiledModule
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/compiled_module.py", line 18, in <module>
    from . import builtins
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/builtins/__init__.py", line 7, in <module>
    from .globals import *
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/builtins/globals.py", line 12, in <module>
    from ..support.procedural import (
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/support/procedural/__init__.py", line 13, in <module>
    from .base import *
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/support/procedural/base.py", line 32, in <module>
    from ..ir_utils import (
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/support/ir_utils.py", line 24, in <module>
    from ...dynamo.type_conversion import (
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/dynamo/__init__.py", line 8, in <module>
    from .tensor import (
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/dynamo/tensor.py", line 24, in <module>
    from ..runtime.device import (
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/runtime/__init__.py", line 7, in <module>
    from .device import *
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/runtime/device.py", line 13, in <module>
    from iree.runtime import (
  File "/home/peter/github/iree-byollvm-build/iree/runtime/bindings/python/iree/runtime/__init__.py", line 13, in <module>
    from . import _binding
  File "/home/peter/github/iree-byollvm-build/iree/runtime/bindings/python/iree/runtime/_binding.py", line 19, in <module>
    from .._runtime.libs import _runtime
ImportError: dynamic module does not define module export function (PyInit__runtime)

Could anyone point me out the reason why it doesn't work?

BTW: In the same environment, I can run the AOT example well with the one located https://github.com/openxla/iree/tree/main/third_party.

Thanks

hanhanW commented 7 months ago

cc @ScottTodd who was in the loop of the discussion

ScottTodd commented 7 months ago

The error message here matches https://github.com/openxla/iree/issues/14644. You could try turning lld off here: https://github.com/openxla/iree/blob/6aa3a8b5ed69fe10186bb86628b8afca24cd5913/build_tools/llvm/byo_llvm.sh#L179-L181

stellaraccident commented 7 months ago

I wonder if this has anything at all to do with byo llvm vs just being a gcc/lld issue with the python runtime.

The specific error is interesting: that is built by nanobind, which does a fair number of toolchain specific optimizations as part of its build. Would I wouldn't expect a problem, if there was a subtle incompatibility, it doesn't surprise me that it would surface in this bit.

stellaraccident commented 7 months ago

This hypothesis could easily be tested by only building the runtime. It has no dependency on llvm and should be invariant to byo llvm vs an in tree build.

Peter9606 commented 7 months ago

The error message here matches #14644. You could try turning lld off here:

https://github.com/openxla/iree/blob/6aa3a8b5ed69fe10186bb86628b8afca24cd5913/build_tools/llvm/byo_llvm.sh#L179-L181

Seems like this solution doesn't work ?

(iree) ➜  llvm git:(tmp) python3 ~/github/SHARK-Turbine/examples/aot_mlp/mlp_export_simple.py
module @MLP {
  func.func @main(%arg0: tensor<1x2xf32>) -> tensor<1x2xf32> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
    %0 = torch_c.from_builtin_tensor %arg0 : tensor<1x2xf32> -> !torch.vtensor<[1,2],f32>
    %1 = call @forward(%0) : (!torch.vtensor<[1,2],f32>) -> !torch.vtensor<[1,2],f32>
    %2 = torch_c.to_builtin_tensor %1 : !torch.vtensor<[1,2],f32> -> tensor<1x2xf32>
    return %2 : tensor<1x2xf32>
  }
  func.func private @forward(%arg0: !torch.vtensor<[1,2],f32>) -> !torch.vtensor<[1,2],f32> {
    %int1 = torch.constant.int 1
    %0 = torch.aten.add.Tensor %arg0, %arg0, %int1 : !torch.vtensor<[1,2],f32>, !torch.vtensor<[1,2],f32>, !torch.int -> !torch.vtensor<[1,2],f32>
    return %0 : !torch.vtensor<[1,2],f32>
  }
}
IREE was not built with support for LLD
Linking failed; escaped command line returned exit code 256:

LLD_VERSION=IREE /home/peter/github/iree-byollvm-build/iree/compiler/bindings/python/iree/compiler/_mlir_libs/iree-lld -flavor gnu -o /tmp/main_dispatch_0-3bf6f6.so --build-id=none -nostdlib -static -shared --no-undefined --no-allow-shlib-undefined --allow-multiple-definition --gc-sections -z now -z relro --discard-all --icf=all --ignore-data-address-equality --ignore-function-address-equality --hash-style=sysv /tmp/main_dispatch_0-3bf6f6.o

loc("<eval_with_key>.0 from /home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py:477 in wrapped":5:0): error: failed to link executable and generate target dylib (check above for more specific error messages)
loc("<eval_with_key>.0 from /home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py:477 in wrapped":5:0): error: failed to serialize executable for target backend llvm-cpu
loc("<eval_with_key>.0 from /home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py:477 in wrapped":5:0): error: failed to serialize executables
Traceback (most recent call last):
  File "/home/peter/github/SHARK-Turbine/examples/aot_mlp/mlp_export_simple.py", line 52, in <module>
    compiled_binary = exported.compile(save_to=None)
  File "/home/peter/miniconda3/envs/iree/lib/python3.10/site-packages/shark_turbine-0.9.4.dev1-py3.10.egg/shark_turbine/aot/exporter.py", line 139, in compile
    raise RuntimeError("Compilation failed: See diagnostics")
RuntimeError: Compilation failed: See diagnostics

Peter9606 commented 7 months ago

I wonder if this has anything at all to do with byo llvm vs just being a gcc/lld issue with the python runtime.

The specific error is interesting: that is built by nanobind, which does a fair number of toolchain specific optimizations as part of its build. Would I wouldn't expect a problem, if there was a subtle incompatibility, it doesn't surprise me that it would surface in this bit.

So if I choose clang for all (llvm + mlir + iree), then this issue won't happen. I'll try this one.

ScottTodd commented 7 months ago

Might be worth taking a step back and hearing more about what problem you're actually trying to solve. Using the Python bindings (and SHARK-Turbine) with the BYO LLVM path is a tricky combination that I'm not sure if anyone else has tried before. The AOT workflows in particular could be decoupled - use stock SHARK-Turbine (with IREE's LLVM) to generate input .mlir/.mlirbc files and then feed those in to your own native tool builds (iree-compile with BYO LLVM or some custom compiler tool).

Peter9606 commented 7 months ago

Might be worth taking a step back and hearing more about what problem you're actually trying to solve. Using the Python bindings (and SHARK-Turbine) with the BYO LLVM path is a tricky combination that I'm not sure if anyone else has tried before. The AOT workflows in particular could be decoupled - use stock SHARK-Turbine (with IREE's LLVM) to generate input .mlir/.mlirbc files and then feed those in to your own native tool builds (iree-compile with BYO LLVM or some custom compiler tool).

Yeah, you might be right. Neither of above solutions works for me. So far I don't have a real problem to solve, just a preparation for bring our own compiler into IREE + SHARK-Turbine combination. If as you said the BYO LLVM + SHARK-Turbine is a tricky one, how about just replace iree/third_party/llvm-project with my own one?

stellaraccident commented 7 months ago

I'm not saying what you are doing can't work, but I'm not aware of any core devs who have tried it. Do know through the grapevine of folks who have.

If you're able to talk more about what you are trying, we could try to chat at an upcoming community meeting and see if anyone has free advice on which paths are the least likely to encounter monsters...

Peter9606 commented 7 months ago

I'm not saying what you are doing can't work, but I'm not aware of any core devs who have tried it. Do know through the grapevine of folks who have.

If you're able to talk more about what you are trying, we could try to chat at an upcoming community meeting and see if anyone has free advice on which paths are the least likely to encounter monsters...

Thanks @stellaraccident. Let me put it this way. I want to adapt IREE + SHAKR Turbine + PyTorch on our own gpGPU. As we have already had a full software stack which is quite similar to CUDA, I think it'll not be as hard as creating a new MLIR based inference tool from zero. I've already tested that the IREE runtime part works as I imagined on our software stack. Now I'm trying to bring our own compiler which is also LLVM based into IREE. Obviously we have two options, one is replace iree/thrid_party/llvm-project with ours, the other one is using BYO LLVM tool. That's the whole story, and I'm fine with the first solution, but it'll be neater to have solution 2 worked for me.

stellaraccident commented 7 months ago

Nice. I am reasonably certain that what you encountered with the runtime is incidental in some way. It can almost certainly be fixed and it's a bug in the build system having to do with how symbols are exported from a nanobind extension.

Smashing two llvm based compilers together is not something to be undertaken lightly. If you're not coupled to the exact tooling flow that the cuda backend uses, I might be looking for a lighter weight way to interface the frontend to backend, possibly even by invoking your backend compiler at the NVVM equiv level as a cl tool before going for a complete one-llvm solution. (This would be a 3rd option more like how ptxas is integrated in the nvidia stack)

iree-org / iree

[BYO LLVM] dynamic module does not define moudle export function (PyInit_runtime) error #16329