super-linter / super-linter

Combination of multiple linters to run as a GitHub Action or standalone
https://github.com/super-linter/super-linter
MIT License
9.54k stars 973 forks source link

Use github's default merge commit instead of requiring full repo history for pull requests #6269

Open sebastian-fredriksson-bernholtz opened 1 month ago

sebastian-fredriksson-bernholtz commented 1 month ago

Is there an existing issue for this?

Current Behavior

Super-linter's documentation states that it needs the full history to determine what files have changed, and gives an error if not enough history has been fetched (see https://github.com/super-linter/super-linter/issues/5313).

Expected Behavior

For pull requests, super-linter should be able to just use a shallow checkout of the triggering commit to determine what files are being changed by the pr, since the triggering commit is the result of merging the pull request branch into the target branch. It is referenced in a hidden (but accessible) refs/pull/PULL_REQUEST_NUMBER/merge ref.

Anything else?

Sorry, I haven't really looked at the code, but I'm guessing that super-linter runs a git diff between the pull request branch and the target branch, because we got it to pass by fetching the history of the pull request branch and the default branch back to the point where their histories split (see https://github.com/super-linter/super-linter/issues/5313#issuecomment-2367022031)?

We did this because we have quite a big monorepo, and have to carefully manage the amount of data we fetch, because we use self-hosted runners (because we ran out of github action minutes) and pay for the data we fetch.

However, super-linter should be able to support just a shallow checkout on pull request and run a git show instead of a git diff (if that is what you do?) to get all the files that have changed.

At first, I thought the The GITHUB_SHA reference (...) doesn't exist in this Git repository error described in https://github.com/super-linter/super-linter/issues/5313 was caused by the fact that the triggering commit is a hidden commit, that is not normally available (depending on how you checkout). However, when looking at the PR that was mentioned as having introduced the issue (#4889), I found that doesn't really seem to be the issue.

Rather, the problem seems to be that super-linter overrides the GITHUB_SHA with the pull request head reference instead of leaving it as the pull request merge reference. So if the pull request merge reference is the commit that has been checked out with with depth 1 (which is the default), the commit referenced by the pull request head reference doesn't exist, because it's one of the parent commits to the merge commit.

ferrarimarco commented 1 month ago

Hi, and thanks for taking the time to report this request.

In the case of running on GitHub Actions, Super-linter uses the following function to initialize diffs:

https://github.com/super-linter/super-linter/blob/7b76efbd69ef471b83d5273d4b5d8b3cbd8e5e3f/lib/functions/buildFileList.sh#L13-L32

Where:

https://github.com/super-linter/super-linter/blob/7b76efbd69ef471b83d5273d4b5d8b3cbd8e5e3f/lib/linter.sh#L297-L303

So, Super-linter doesn't need the whole repository history, but just the minimum amount of data to account for the above.

We suggest setting fetch-dept: 0 (i.e. fetch everything) for simplicity. Super-linter emits that message when either GITHUB_SHA doesn't exist:

https://github.com/super-linter/super-linter/blob/7b76efbd69ef471b83d5273d4b5d8b3cbd8e5e3f/lib/functions/validation.sh#L266-L276

or when there's an error running the diff command:

https://github.com/super-linter/super-linter/blob/7b76efbd69ef471b83d5273d4b5d8b3cbd8e5e3f/lib/functions/buildFileList.sh#L34-L49

To your points:

However, super-linter should be able to support just a shallow checkout on pull request and run a git show instead of a git diff (if that is what you do?) to get all the files that have changed.

git show is likely not meant for this, and would require some parsing + specifying a suitable output format. Also, it will probably have the same issue when you run it against a commit that you didn't fetch (I didn't test this). And, what about when you push more than one commit?

Rather, the problem seems to be that super-linter overrides the GITHUB_SHA with the pull request head reference instead of leaving it as the pull request merge reference. So if the pull request merge reference is the commit that has been checked out with with depth 1 (which is the default), the commit referenced by the pull request head reference doesn't exist, because it's one of the parent commits to the merge commit.

But you have a point here that we probably don't require the full git history. Also, we don't want to handle the fetch operations from within Super-linter because they can potentially modify the state of the local copy of your repository.

We'll be happy to review any PRs about this!