onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
760 stars 319 forks source link

LLVM updates & green commit tracking #1624

Open sstamenova opened 2 years ago

sstamenova commented 2 years ago

There's an ongoing experiment to keep mhlo, torch-mlir and onnx-mlir in sync with respect to their llvm commit. These projects historically have different update cadences and rarely converge on a single commit. The current effort is to make sure that they converge at least sometimes with mhlo and torch-mlir having weekly updates and onnx-mlir having roughly monthly updates (though more frequent is possible for each of the projects).

The commit tracking is happening here: https://github.com/llvm/torch-mlir/issues/1178 and the "green" commit is guaranteed to have a green build and lit tests for llvm, lld, lldb and mlir on Ubuntu and Windows, though we can expand the testing to including more platforms if necessary.

The latest update to the llvm commit in onnx-mlir put it in sync with torch-mlir, further updates should attempt to do the same and it would be beneficial for everyone if anyone working on an update tries to pick up a green commit from the experiment and comments here to let others know that an update is in progress.

sstamenova commented 2 years ago

@AlexandreEichenberger

tungld commented 2 years ago

PR #1628 updates LLVM with this green commits:

Week of 8/22/2022:
Green LLVM commit: e5d5146323ffaa13eb5185616c6ae5c36b69352d
Green MHLO commit: ace4030dd55fce2a74e46f71f1937feb61ed1e3f (branch greencommit/2022-08-22-e5d51463)
AlexandreEichenberger commented 2 years ago

PR #1628 merged

imaihal commented 2 years ago

@AlexandreEichenberger @tungld

It seems https://github.com/llvm/llvm-project/commit/e5d5146323ffaa13eb5185616c6ae5c36b69352d fails some installation tests (cmake --build . --target check-mlir) on Linux s390x. This seems to be fixed by https://github.com/llvm/llvm-project/commit/df4e637ca7ef4ef17b662845120864921e65bb67 I confirmed that on buildbot and my own environment. The error is the same with this build bot results

These errors are not critical for onnx-mlir, but it might be better to update the commit again in some timing.

tungld commented 2 years ago

This seems to be fixed by https://github.com/llvm/llvm-project/commit/df4e637ca7ef4ef17b662845120864921e65bb67

Does this really fix the issue on s390x? I saw it just marked them as UNSUPPORTED and the comment says: // Bytecode currently does not support big-endian platforms

ljfitz commented 1 year ago

We're also finding the green commit approach useful and wonder if therre has been discussion on a rotation process between onnx-mlir fans to take turns updating onnx-mlir to green llvm-project commits? If so, we'd like to contribute.

We're in the process of preparing an update PR for the green commit from Nov. 14, and realize we may be unknowingly duplicating effort with others.

CC recent update PR authors: @chenchongsong @MikeHolman @tungld

tungld commented 1 year ago

We're in the process of preparing an update PR for the green commit from Nov. 14, and realize we may be unknowingly duplicating effort with others.

There is no duplicating effort from our side as far as I know. Thanks!

ljfitz commented 1 year ago

I see contributors to the torch-mlir project have organized a rotation scheme for managing updates to green commits: https://github.com/llvm/torch-mlir/wiki/Weekly-LLVM-Update, as a result of discussion to see if there was enough support.

I see the topic of updating was discussed in https://github.com/onnx/onnx-mlir/wiki/Informal-meeting-agenda-and-notes#september-13rd-2022 and wonder if there was any support for adopting a more predictable approach?

sstamenova commented 1 year ago

I think that's a great idea. Considering how complex some of the updates get, I'd recommend every couple of weeks as a good cadence. We can probably have 1 or 2 people from our side participate in the rotation as well. @AlexandreEichenberger what do you think?

sstamenova commented 1 year ago

@AlexandreEichenberger : Can we add this to the agenda for the community meeting in the next couple of weeks if it hasn't been discussed yet?

AlexandreEichenberger commented 1 year ago

@christopherbate @gargaroff

We are trying to get two folks from each actively participating company to help out with the LLVM update. The process has been pretty smooth the last couple of iterations, largely because by doing it regularly, it reduces the amount of work to do for each iteration.

Would you consider another volunteer from NVIDIA and AMD? That would be greatly appreciated.

Big thanks for helping out.

You can select your preferred time here: https://github.com/onnx/onnx-mlir/wiki/LLVM-Update-Schedule

gargaroff commented 1 year ago

@AlexandreEichenberger I could do another bump in the week of 2023-05-01.

sorenlassen commented 1 year ago

@chenchongsong @tungld I saw in PR #2252 you bumped the LLVM commit to a "non standard" commit

I think we need to be cautious about that because it brings us out of sync with torch-mlir, which I understood to be a key motivation for our LLVM updates process

hopefully it doesn't create any problems but in the future please post here in the master issue to build consensus before we diverge from the commits from https://github.com/llvm/torch-mlir/issues/1178

hamptonm1 commented 1 year ago

@chenchongsong @tungld I saw in PR #2252 you bumped the LLVM commit to a "non standard" commit

I think we need to be cautious about that because it brings us out of sync with torch-mlir, which I understood to be a key motivation for our LLVM updates process

hopefully it doesn't create any problems but in the future please post here in the master issue to build consensus before we diverge from the commits from llvm/torch-mlir#1178

@chenchongsong We should only be using the green commits mentioned by @ashay to keep things in sync. I actually think the recent issues @cjvolzka and @negiyas are seeing maybe related to this. I am going to revert the LLVM commit to d421f5226048e4a5d88aab157d0f4d434c43f208 to see if this solves the issue. The below issue just started to occur with the recent LLVM bump:

$ ./build/Debug/bin/onnx-mlir -mcpu=z14 --EmitONNXIR  bidaf-9-nocategorymaper-onnxbasic.mlir
onnx-mlir: /home1/negishi/src/dlc.git/llvm-project/llvm/include/llvm/Support/Casting.h:566: decltype(auto) llvm::cast(const From&) [with To = mlir::RankedTensorType; From = mlir::Type]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
./compile-bidaf9.sh: line 76: 642957 Aborted                 (core dumped) /home1/negishi/src/dlc.git/onnx-mlir.main/build/Debug/bin/onnx-mlir -mcpu=z14 --EmitONNXIR bidaf-9-nocategorymaper-onnxbasic.mlir