Implement judicious skipping of CI jobs

vyasr commented 2 months ago

Currently any change to any RAPIDS repo triggers a complete run of the entire build and test suite of CI jobs. This is very expensive, and is often unnecessary. RAPIDS libraries are typically structured such that they have a clear, linear dependency chain between different components. Some examples:

Changes to Python code should have no effect whatsoever on C++ testing
For repositories that publish multiple Python packages, there is typically a linear dependency between at least some of these such that changes to downstream packages has no effect on the tests of upstream package (e.g. changes to cugraph would not affect pylibcugraph tests).
Changes to documentation should only require documentation rebuilds, no test runs.

We can reduce the number of unnecessary jobs that we run in CI by more judiciously skipping jobs that are unnecessary. The simplest approach to do this is by simply checking what files have changed. We have previously implemented a form of this in cudf for both cudf.pandas and now for cudf-polars. To do this, I would propose the following steps:

Generalizing the above logic for detecting changes into a shared workflow
Enable filtering jobs using the above shared workflow
Generalize the pr-builder job to allow some jobs to be skipped under appropriate circumstances.

KyleFromNVIDIA commented 2 months ago

pytest-incremental may help us with skipping unneeded Python tests:

https://pypi.org/project/pytest-incremental/

It hasn't been updated in a while though, it may need some work.

vyasr commented 1 month ago

@KyleFromNVIDIA now that we have a common shared workflow and about a month of data on how pruning the jobs works, do we want to roll this feature out to the rest of RAPIDS and then close this issue?

jakirkham commented 1 month ago

In some cases we want to run all CI. Is there a good way to bypass this selection behavior?

KyleFromNVIDIA commented 1 month ago

Please describe a scenario in which you want to run a CI job even though the relevant files haven't changed. We can look at ways to force it to run.

In the meantime, as a workaround, you can add a comment to a relevant file to force CI to run, then remove it again once CI has passed.

KyleFromNVIDIA commented 1 month ago

We could create a new label called "Force CI Run". Then the condition could be something like this:

if: fromJSON(needs.changed-files.outputs.changed_file_groups).test_cpp || fromJSON(needs.pr-info.outputs.pr-info).labels.*.name == "Force CI Run"

Or we could even modify the changed-files workflow with a new output to give us the "force CI run" status:

if: fromJSON(needs.changed-files.outputs.changed_file_groups).test_cpp || needs.changed-files.outputs.force_ci_run

If the label is initially not provided, and is added later, we can then trigger a new workflow run with the label applied by:

git commit --allow-empty -m 'Re-run Ci'
git push

jameslamb commented 1 month ago

My perspective: if the main goal is just to support PRs like https://github.com/rapidsai/pynvjitlink/pull/107 of the form "I just want to see if a CI issue is unrelated to my code changes", I'd prefer just adding a comment to a central file (like a C++ source file) instead of adding complexity to the changed-files mechanism.

KyleFromNVIDIA commented 1 month ago

I've opened https://github.com/rapidsai/shared-workflows/pull/249 and https://github.com/rapidsai/cudf/pull/17064 to test out the approach outlined at https://github.com/rapidsai/build-planning/issues/94#issuecomment-2407577888.

KyleFromNVIDIA commented 1 month ago

The above POC works, but if @jameslamb is correct then it may be better to just modify a C++ file with a comment to force the run. @jakirkham WDYT?

vyasr commented 1 month ago

I made the same assumption as James regarding the purpose of the request and come to the same conclusion. I would prefer to discourage using CI as a way to "just run tests", which is one possible use case. If we are specifically testing the behavior of CI (e.g. CI fails but locally tests pass), then adding a trivial change is not a very high barrier. Alternatively, instead of using PR CI you can use either build or test workflows. Those both have workflow triggers available so they can be run directly from the Actions tab, no PR needed. A special label feels like extra machinery that we don't need.

jameslamb commented 3 weeks ago

@KyleFromNVIDIA what is left for this? It's difficult for me to tell which repos we still haven't made these changes for.

KyleFromNVIDIA commented 2 weeks ago

Looks like cuspatial has not yet been done. I'll go work on that now. I'm not sure if there's anything else. The way to tell is to look at .github/workflows/pr.yaml and see if there is a changed-files workflow.

jameslamb commented 2 weeks ago

The way to tell is to look at .github/workflows/pr.yaml and see if there is a changed-files workflow.

Ok, I was hoping you had a list of what remained and it just hadn't made it into this issue yet. I'm not planning to go through all the repos looking for that workflow, I trust your estimation of what's left.

jameslamb commented 5 days ago

@KyleFromNVIDIA could you please do one more check like the one you mentioned in https://github.com/rapidsai/build-planning/issues/94#issuecomment-2430162923, and then close this issue if you think it's done?

rapidsai / build-planning

Implement judicious skipping of CI jobs #94