Closed srcarroll closed 1 month ago
I can't find any current lowering path for onnx.Sum
. I figured it would be best to add decomposition pattern here rather than do a direct conversion to stablehlo. I'm open to suggestions if this is not a desirable change.
Can one of the admins verify this patch?
Can one of the admins verify this patch?
just realized i messsed up the one input case. fixing now
Can one of the admins verify this patch?
not sure what's up with the jenkins failures though
Starting a new batch of CI, let me look at the PR soon.
Starting a new batch of CI, let me look at the PR soon.
Thanks. I still need to figure out the test failure too. I thought fixing the one input case woud have resolved it but it didn't. I'm unfamiliar with the test that's failing so will have to dig.
@srcarroll here is the log of the error for Linux amd64
/usr/lib/python3.10/subprocess.py:526: CalledProcessError
----------------------------- Captured stdout call -----------------------------
[1/6] Thu Oct 3 09:08:26 2024 (0s) Importing ONNX Model to MLIR Module from "test_sum_one_input.onnx"
[2/6] Thu Oct 3 09:08:26 2024 (0s) Compiling and Optimizing MLIR Module
----------------------------- Captured stderr call -----------------------------
error: failed to legalize unresolved materialization from ('tensor<?xf32>') to 'tensor<*xf32>' that remained live after conversion
=========================== short test summary info ============================
Debug/test.py::OnnxBackendNodeModelTest::test_sum_one_input_cpu - subprocess...
1 failed, 521 passed, 2344 skipped in 84.49s (0:01:24)
make[3]: *** [test/backend/CMakeFiles/check-onnx-backend-dynamic.dir/build.make:71: test/backend/CMakeFiles/check-onnx-backend-dynamic] Error 1
make[2]: *** [CMakeFiles/Makefile2:14510: test/backend/CMakeFiles/check-onnx-backend-dynamic.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
bringing up nodes...
bringing up nodes...
It feels like something is missing for the shape inference. If you convert the sum into a series of add before shape infrence, then shape inference should be able to operate normally. Maybe you need to put it earlier than currently placed.
You can run the test directly in your env
TEST_CASE_BY_USER=test_sum_one_input_cpu make -j 12 check-onnx-backend-dynamic
You can add compiler options (for debugging) by defining the ONNX_MLIR_FLAGS
env var
Hope this helps.
Note on the check-onnx-backend-dynamic
: we force all the input sizes to be made dynamic (-1) at compilation time and then run the model with the original sizes set at run time. That ensure us that all of the ONNX tests run both with static (as they are in ONNX repo) and dynamic (as we force them to be here).
@AlexandreEichenberger thanks for the pointers. I'm actually having a hard time setting up my environment to run python tests and haven't had time to look into fixing it. so i'm just guessing based on the error what the problem is. i think this error would happen any time the result type of onnx.Sum
is an unranked tensor, but the inputs are ranked. hence why it complains about unrealized cast from tensor<?xf32> -> tensor<*xf32>
. so one solution i can think of is to just cast the result of the final onnx.Add
to the original onnx.Sum
result type when they don't match. i'll push soon
i was able to reproduce the error on an example IR and have added that case to lit test to test the fix
Can one of the admins verify this patch?
@jenkins-droid test this please
Yes, setting up is a bit tricky, to get ONNX installed (I do a pip install -e third_party/onnx
, protobuf sometimes cause problems). That is why you can also develop on docker images that we publish, where all should be working smoothly.
@jenkins-droid test this please
(I think our Jenkins is having problems, folks are working on this)
@Sunny-Anand mind doing a "test this please" on this PR once the Jenkins issue is fixed, thanks
All checks have passed.
@AlexandreEichenberger thanks! yes please do merge.
@srcarroll thanks for your contributions, much appreciated
Jenkins Linux amd64 Build #15775 [push] Implement decomposition ... started at 18:50
Jenkins Linux s390x Build #15778 [push] Implement decomposition ... started at 19:50
Jenkins Linux ppc64le Build #14805 [push] Implement decomposition ... started at 20:02
Jenkins Linux amd64 Build #15775 [push] Implement decomposition ... passed after 1 hr 18 min
Jenkins Linux s390x Build #15778 [push] Implement decomposition ... passed after 1 hr 42 min
Jenkins Linux ppc64le Build #14805 [push] Implement decomposition ... passed after 2 hr 17 min
Can one of the admins verify this patch?