Some tests are too large that cause CI timeout - Githubissues

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28k stars 11.56k forks source link

Some tests are too large that cause CI timeout #54821

Open junaire opened 2 years ago

junaire commented 2 years ago

If you go to clang/test directory and run find . -type f -size +1M, you will find some tests are ridiculously large (over 1MB). I think this is why sometimes the CI failed (https://reviews.llvm.org/harbormaster/unit/157760/)

./CodeGen/arm_neon_intrinsics.c
./CodeGen/RISCV/rvv-intrinsics/vluxseg_mask_mf.c
./CodeGen/RISCV/rvv-intrinsics/vluxseg_mask.c
./CodeGen/RISCV/rvv-intrinsics/vloxseg_mask_mf.c
./CodeGen/RISCV/rvv-intrinsics/vloxseg_mask.c
./OpenMP/teams_distribute_parallel_for_schedule_codegen.cpp
./OpenMP/target_teams_distribute_simd_codegen.cpp
./OpenMP/target_parallel_for_codegen.cpp
./OpenMP/distribute_simd_codegen.cpp
./OpenMP/target_teams_distribute_codegen.cpp
./OpenMP/distribute_parallel_for_simd_codegen.cpp
./OpenMP/target_teams_codegen.cpp
./OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp
./OpenMP/distribute_parallel_for_codegen.cpp
./OpenMP/target_teams_distribute_parallel_for_schedule_codegen.cpp
./OpenMP/target_parallel_for_simd_codegen.cpp
./OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp

Maybe we can split the test into several ones and hopefully, it can stop the failure of build bots in some cases.

CC @topperc @nikic @alexey-bataev

llvmbot commented 2 years ago

@llvm/issue-subscribers-good-first-issue

DavidSpickett commented 2 years ago

For the arm test, to split it in half: https://reviews.llvm.org/D123601

Which is a ~40% reduction in testing time on my local machine. So if the test would have taken > 84 seconds this won't help but worth a try? Unless you happen to know how long these tests take if you let them run.

tstellar commented 2 years ago

I think splitting the tests is the wrong approach. Each new file creates additional overhead and slows down the whole run. Can we just add timeout exceptions for the failing tests?

DavidSpickett commented 2 years ago

Out of the box lit's per test timeout is global and overrides anything set in a local config (I don't think we want to add timeouts in the llvm-project files in any case).

There are project level excludes in pre merge https://github.com/google/llvm-premerge-checks/blob/main/scripts/llvm-dependencies.yaml but we still want to run the tests just with more time.

There is LIT_FILTER https://reviews.llvm.org/D35091. So one method could be to filter out these tests, then filter out everything but these tests and use a different timeout.

nikic commented 2 years ago

Running clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg_mask.c under perf record, it looks like we spend nearly all the time in regex code in FileCheck. I wonder whether there is room for optimization there.

nikic commented 2 years ago

Callgrind tells me that vast majority of the regex matching time is spent in the dissect method: https://github.com/llvm/llvm-project/blob/201c4b9cc4a649e4a54b418749ab5b8bd227e4f0/llvm/lib/Support/regengine.inc#L292

This method populates the subpattern matches. If I'm reading the code at https://github.com/llvm/llvm-project/blob/201c4b9cc4a649e4a54b418749ab5b8bd227e4f0/llvm/lib/Support/regengine.inc#L377 right, it handles things like .+ by doing a maximal match, checking whether the rest will match, and then successively reducing the size of the match until the remainder matches. This makes the code quadratic...

nikic commented 2 years ago

I've put up https://reviews.llvm.org/D123776 to improve performance of the regex implementation for this case.

Though I got to say that the regex implementation is really bad, and I wonder whether it would be feasible/sensible to bundle a high-quality regex implementation like PCRE instead.

junaire commented 2 years ago

I've put up https://reviews.llvm.org/D123776 to improve performance of the regex implementation for this case.

Though I got to say that the regex implementation is really bad, and I wonder whether it would be feasible/sensible to bundle a high-quality regex implementation like PCRE instead.

Hi, thanks for working on this!👍 I just skimmed through the code, looks like LLVM's regex implementation is ported from OpenBSD? I wonder if it is possible to use better regex library as well, should we send an RFC?

nikic commented 2 years ago

https://reviews.llvm.org/D124237 to reduce the size of large.test, which seems to be the main source of timeouts now.

nikic commented 2 years ago

Examples/standalone/test.toy from MLIR and bench_threads.cpp from ThreadSanitizer also regularly time out.

tstellar commented 2 years ago

I don't really want to block people from getting the buildbot back to green, but if we are unable to make changes to the CI to increase the timeouts, that seems like a fundamental problem that needs to be addressed. Has this issues been raised with the CI maintainers?

junaire commented 2 years ago

I don't really want to block people from getting the buildbot back to green, but if we are unable to make changes to the CI to increase the timeouts, that seems like a fundamental problem that needs to be addressed. Has this issues been raised with the CI maintainers?

Hey Tom, I understand what do you mean. I think you're right, we can fundamentally solve this problem by asking CI maintainers to increase the timeouts, but I guess it's not just about CI problems in some way. Testing is major cost when developing LLVM, (In general it takes about 10 minutes for me to run ninja check-clang, my computer is slow I know...) so I'm sure everyone would be happy if we could reduce testing time without compromising the test quality.

tstellar commented 2 years ago

@junaire I totally agree with you, but splitting the tests in two doesn't reduce total testing time, it's just a work around to avoid the individual test timeouts.

junyer commented 2 years ago

Given that the submatches are typically tiny, making FileCheck emit (.*?) rather than (.*) would have been much simpler and much faster... if only it weren't seemingly wedded to POSIX ERE syntax, which doesn't support non-greedy quantifiers. :(

fhahn commented 2 years ago

I don't really want to block people from getting the buildbot back to green, but if we are unable to make changes to the CI to increase the timeouts, that seems like a fundamental problem that needs to be addressed. Has this issues been raised with the CI maintainers?

I think one issue is that it is not entirely clear to me how to raise this issue with the pre-merge test maintainers. IIUC the pre-commit tests use google/llvm-premerge-checks, so I filed https://github.com/google/llvm-premerge-checks/issues/399.

fhahn commented 2 years ago

Another long running test: x64 debian > LLVM.CodeGen/NVPTX::wmma.py https://reviews.llvm.org/harbormaster/unit/view/3463289/

AFAICT there's been no action so far on the CI infrastructure side, so preceding with speeding up/splitting up tests seems like the only short term way forward to improve our pre-commit CI experience.