[CI] Verbose flag check

newling commented 2 months ago

It'd actually be nice to have a line printed for each test that runs. Next step: set up a few levels of verbosity

0: nothing 1: a line for each test run 2: default, sensible 3: debug

Need to understand what 'count' in

 parser.add_argument("-v", "--verbose", action="count", default=0)

makslevental commented 2 months ago

Need to understand what 'count' in

-v -> args.verbose == 1 while -vv -> args.verbose == 2.

makslevental commented 2 months ago

In general, what I think of when I see us fiddling with this kind of stuff, is we just need to port to pytest. The only reason I haven't done it yet is because pytest doesn't handle segfaults well despite claims to the contrary. At least so I recall - it's worth trying out for us. But actually https://github.com/pytest-dev/pytest-xdist/issues/126 gives us the solution: just use catchsegv.sh (or write our own signal handler).

newling commented 2 months ago

In general, what I think of when I see us fiddling with this kind of stuff, is we just need to port to pytest. The only reason I haven't done it yet is because pytest doesn't handle segfaults well despite claims to the contrary. At least so I recall - it's worth trying out for us. But actually https://github.com/pytest-dev/pytest-xdist/issues/126 gives us the solution: just use catchsegv.sh (or write our own signal handler).

Ok. I'm not what 'this kind of stuff' is that you're referring to here. I'm not going to do this anytime soon and would like to maintain this script for as long as we need it.

makslevental commented 2 months ago

Ok. I'm not what 'this kind of stuff' is that you're referring to here. I'm not going to do this anytime soon and would like to maintain this script for as long as we need it.

We've been fiddling with this script a lot lately right? Flags, shell out adjustments, changing how the tests are layered. It signals to me that the script is not a substitute for a full-fledged testing framework that gracefully handles these things. In addition a real testing framework could give us other useful features like perf regression tracking via some kind of standardizing logging (compile times, runtimes, etc).

In IREE itself I believe they're revamping the testing and benchmarking framework so I'm waiting until then to see whether we can use it or if we'll need our own.

makslevental commented 2 months ago

Just for comparison this is what pytest looks like

https://github.com/makslevental/mlir-python-extras/tree/main/tests

and here's what the test output looks like when everything passes

https://github.com/makslevental/mlir-python-extras/actions/runs/10714821483/job/29709174975#step:7:16

And when something fails

https://github.com/makslevental/mlir-python-extras/actions/runs/10448085182/job/28927992854

We could even be leveraging the iree/mlir python bindings to do runtime tests rather than shell out

https://github.com/makslevental/mlir-python-extras/blob/main/tests/test_runtime.py

Anyway it's second order stuff but I think it's inevitable we'll need a more serious testing framework.

newling commented 2 months ago

Ok. I agree that we can do better by using pytest. For now though, can you please accept the PRs I make improving this script? I'm not planning on adding any serious bells and whistles, but this PR fixes a broken test.

newling commented 2 months ago

Bit the bullet and doing this is parallel https://github.com/nod-ai/iree-amd-aie/pull/752

makslevental commented 2 months ago

Ok. I agree that we can do better by using pytest. For now though, can you please accept the PRs I make improving this script? I'm not planning on adding any serious bells and whistles, but this PR fixes a broken test.

My bad I didn't realize you were like actually actually waiting on me to approve. Sorry!

nod-ai / iree-amd-aie

[CI] Verbose flag check #725