Closed cjvolzka closed 6 months ago
@negiyas reported he was able to successfully compile the model but it took 170Gb of memory.
@imaihal reported if LLVM patch https://reviews.llvm.org/D148487 is applied, memory usage caps at 1GB and takes about 5 min.
@tungld do you have bandwidth to see if we can get your llvm patch merged into llvm to fix the issue?
I thought we fixed this problem originally observed in https://github.com/onnx/onnx-mlir/issues/2084 last year but I guess @tungld's LLVM patch was reverted due to some problem?
@gongsu832 yes, I did the LLVM patch but it somehow caused flang in llvm failed, so it was reverted.
Hi all, I modified @tungld's LLVM patch so it doesn't crash the repro in https://github.com/llvm/llvm-project/issues/62802 which cause the patch being reverted.
Could anyone help to test if it still helps bidaf-9 model? (I don't know how to setup onnx-mlir to run an onnx model :( )
Here's the modified LLVM patch. dangling-const.patch I'll create a pull request to llvm repo once we can confirm the patch helps.
@python3kgae great, thanks for your patch! It looks like your patch is for old LLVM code. Do you have a patch for recent LLVM code?
@python3kgae I checked bidaf-9 with your patch, and memory consumption was peak at around 1.7 GB. So it does help bidaf-9. Thank you very much @python3kgae!
@python3kgae great, thanks for your patch! It looks like your patch is for old LLVM code. Do you have a patch for recent LLVM code?
I'm using old LLVM code to test the old repro. I'll change to recent LLVM code when create pull request to LLVM repo.
Pull request created https://github.com/llvm/llvm-project/pull/82708
I tried to run onnx-mlir bidaf-9.onnx But hit error in https://github.com/onnx/onnx-mlir/blob/main/src/Conversion/ONNXToKrnl/Math/Reduction.cpp#L709 because estimatedSimdLoopTripCount not initialized.
Is this expected for Windows build of onnx-mlir?
I tried to run onnx-mlir bidaf-9.onnx But hit error in https://github.com/onnx/onnx-mlir/blob/main/src/Conversion/ONNXToKrnl/Math/Reduction.cpp#L709 because estimatedSimdLoopTripCount not initialized.
Is this expected for Windows build of onnx-mlir?
SIMD related code typically only works on s390x Linux so failure on Windows isn't surprising. @AlexandreEichenberger should be able to provide more definitive answer since he wrote most of the SIMD code.
Created a PR to create only one globalOp for all strings in a string literal https://github.com/onnx/onnx-mlir/pull/2727
This could save a lot of time when debugging bidaf-9 model.
I tried to run onnx-mlir bidaf-9.onnx But hit error in https://github.com/onnx/onnx-mlir/blob/main/src/Conversion/ONNXToKrnl/Math/Reduction.cpp#L709 because estimatedSimdLoopTripCount not initialized.
Is this expected for Windows build of onnx-mlir?
I believe that this happens because on Windows, we run with warning as error. If you don't mind, probably just adding =0
int64_t estimatedSimdLoopTripCount = 0;
here https://github.com/onnx/onnx-mlir/blob/01c5c9fb536a43cde36abccf562bb2f6cb594cb4/src/Conversion/ONNXToKrnl/Math/Reduction.cpp#L490 would fix the problem.
In general, SIMD works on x86 Linux, got to assume it does to for Window.
@python3kgae thanks so much!!!
@python3kgae thanks so much!!!
Thank you for create this project :)
Closing as this was fixed by recent llvm uplift.
When I attempt to compile the bidaf-9 model from the onnx model zoo, compiling stops after about 7 minutes with no information.
Watching memory usage during compiling, it uses about 300mb upt to about 5 min. After that, it starts to grow reaching just short of 60Gb before it gets killed at 7 min, presumably by the Linux OOM Killer as my system runs out of memory.