Presubmit testing failed to catch basic clang lit test breakages

zmodem commented 1 year ago

(Breaking this discussion out of https://github.com/llvm/llvm-project/pull/71322)

https://github.com/llvm/llvm-project/commit/ace4489397d17abfb20d36de1404cfbe102401a7 broke a bunch of Clang lit tests (e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/37563).

The breakages did not require any exotic platform, operating system, or build configuration -- it broke everywhere.

The patch did go through a pull request, so presubmit testing should have caught it.

How come it wasn't caught? What can we do to fix it?

@metaflow @tstellar @joker-eph

llvmbot commented 1 year ago

@llvm/issue-subscribers-infrastructure

Author: Hans (zmodem)

(Breaking this discussion out of https://github.com/llvm/llvm-project/pull/71322) https://github.com/llvm/llvm-project/commit/ace4489397d17abfb20d36de1404cfbe102401a7 broke a bunch of Clang lit tests (e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/37563). The breakages did not require any exotic platform, operating system, or build configuration -- it broke everywhere. The patch did go through a pull request, so presubmit testing should have caught it. How come it wasn't caught? What can we do to fix it? @metaflow @tstellar @joker-eph

joker-eph commented 1 year ago

This PR touches clang, but the change-detection logic seems to have considered that check-clang wasn't in scope somehow. I don't know why:

./.ci/monolithic-linux.sh "clang;clang-tools-extra;libc;llvm" "check-clang-tools check-libc"

It only triggers clang-tools and libc tests here.

metaflow commented 1 year ago

I will try to look into this tomorrow.

metaflow commented 1 year ago

@joker-eph yes, "clang" is not included in the project list as clang/ modifications trigger a separate pipeline "clang-ci" that does some additional checks with libcxx (probably @ldionne or @mstorsjo know better).

And this pipeline was triggered and failed for th PR

Link to buildkite: https://buildkite.com/llvm-project/github-pull-requests/builds/12931

So for me it seems that CI worked fine and author simply ignored the test results.

zmodem commented 1 year ago

And this pipeline was triggered and failed for th PR Link to buildkite: https://buildkite.com/llvm-project/github-pull-requests/builds/12931 So for me it seems that CI worked fine and author simply ignored the test results.

That run (I looked at the Linux one) only flagged 3 unrelated clang-tidy test failures. It didn't flag the 50 failing clang tests, though, which is the issue.

joker-eph commented 1 year ago

That buildkite run is exactly the one I copy/pasted the command above:

./.ci/monolithic-linux.sh "clang;clang-tools-extra;libc;llvm" "check-clang-tools check-libc"

Where is check-clang supposed to happen? I wouldn't expect it to happen elsewhere, especially since we're already building most of it here anyway to be able to test LLVM and clang-tools...

metaflow commented 1 year ago

Have you checked clang ci pipeline results? https://buildkite.com/llvm-project/clang-ci/builds/6211#018ba4b7-5c43-4209-a282-e7cbbaabc074 there are 47 failures

metaflow commented 1 year ago

(I agree that it might have been presented better and aggregated in some way)

ldionne commented 1 year ago

I agree with @metaflow, it looks like the CI results were simply ignored. @metaflow I would not expect the monolithic-linux and monolithic-windows tests to trigger at all though, do you know why they were triggered? It seems like this caused part of the confusion here. In other words, I would have expected that only the clang-ci BuildKite pipeline gets triggered, which would have made it the only source of failures. And then if you ignore that, it's unambiguously your fault.

metaflow commented 1 year ago

there were triggered as clang-tools-extra was touched

metaflow commented 1 year ago

ldionne commented 1 year ago

Ahh, I see. One thing we could do here is potentially trigger the appropriate jobs directly in this pipeline instead of triggering the clang-ci pipeline indirectly. I think this was necessary to make it work on both GitHub PRs and Phabricator, but if we are willing to drop Phabricator support now, I think we might be able to remove one level of indirection. That way, the following failures would be on the same page as the other ones:

metaflow commented 1 year ago

I like the idea of removing this indirection. One thing to think about is how to organize "dependent builds" - right now some of checks for libcxx will not start if basic linux / windows have not passed. We can either make them dependent on all "first layer" checks, or specific steps.

zmodem commented 1 year ago

Have you checked clang ci pipeline results? https://buildkite.com/llvm-project/clang-ci/builds/6211#018ba4b7-5c43-4209-a282-e7cbbaabc074 there are 47 failures (I agree that it might have been presented better and aggregated in some way)

Given that the author of https://github.com/llvm/llvm-project/pull/71322, Aaron, Mehdi, and I all failed to spot these failures, yes that strongly suggests the presubmit testing UI could be better :-)

I like the idea of removing this indirection. One thing to think about is how to organize "dependent builds" - right now some of checks for libcxx will not start if basic linux / windows have not passed. We can either make them dependent on all "first layer" checks, or specific steps.

I don't know anything about this, but if you want to "gate" some tests on others, couldn't you just do ninja check-clang check-llvm && ninja check-cxx?

metaflow commented 1 year ago

Agree. I think we should change the pipeline. I think that that will be easier to do after we drop support for phab workflow. My plan is to do that the next week (20 November) https://discourse.llvm.org/t/stop-pre-merge-checks-for-phabricator/74871

ldionne commented 1 year ago

Actually, in light of the recent progress made by @EricWF with Github Actions, I think we shouldn't change anything about the BuildKite indirections here. We are going to migrate our pre-commit CI checks to Github Actions, which should resolve this issue entirely.

llvm / llvm-project

Presubmit testing failed to catch basic clang lit test breakages #71532