lycheeverse / lychee-action

Github action to check for broken links in Markdown, HTML, and text files using lychee, a fast link checker written in Rust.
https://lychee.cli.rs
Apache License 2.0
323 stars 48 forks source link

Check links relevant to the changes in a PR only #238

Closed f-hollow closed 1 week ago

f-hollow commented 2 months ago

Naturally, it is desirable to minimize bothering your contributor with issues introduced by somebody else in your repo. So I need a way to check and report invalid links (1) added / updated in that contributor's PR only, or (2) existing links affected by the introduced changes.

It appeared to be harder than I expected. You cannot run lychee on the changes only, because the links internal to your repo (checked with my beloved --include-fragments option) can and will produce false positives.

Browsing around this repo, I came across mre's comment about separately checking existing links and new links. This idea nudged me in the right direction! The related issues #17 and #134 offer no immediately usable solution yet.

Below is the workflow that I created and tested so far. It works and does what I need.

Possible improvements:

Your feedback or further improvements to this workflow would be very much appreciated!

The workflow that checks links relevant to the changes in a PR only

name: Check links in diffs

on:
  pull_request:
    branches: [main]

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - name: Clone repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          ref: ${{github.event.pull_request.head.ref}}
          repository: ${{github.event.pull_request.head.repo.full_name}}

      - name: Check out main branch
        run: git checkout main

      - name: Dump all links from main
        id: dump_links_from_main
        uses: lycheeverse/lychee-action@v1
        with:
          args: |
            --dump
            --include-fragments
            .
          output: ./links-main.txt

      - name: Stash untracked files
        run: git stash push --include-untracked

      - name: Check out feature branch
        run: git checkout ${{ github.head_ref }}

      - name: Apply stashed changes
        # Apply stashed changes, ignore errors if stash is empty
        run: git stash pop || true

      - name: Append links-main.txt to .lycheeignore
        run: cat links-main.txt >> .lycheeignore

      - name: Check links
        uses: lycheeverse/lychee-action@v1
        with:
          args: |
            --no-progress
            --include-fragments
            .
          # Fail action on broken links
          fail: true

      - name: Suggestions
        if: failure()
        run: |
          echo -e "\nPlease review the links reported in the Check links step above."
          echo -e "If a link is valid but fails due to a CAPTCHA challenge, IP blocking, login requirements, etc.,
          consider adding such links to .lycheeignore file to bypass future checks.\n"
          exit 1
mre commented 2 months ago

Interesting.

I think we could add that to our documentation at https://github.com/lycheeverse/lycheeverse.github.io

It could be next to this recipe in the documentation hierarchy: https://lychee.cli.rs/github_action_recipes/add-pr-comment/

Would you like to create a pull request? Otherwise I can also take care of it.

mre commented 2 months ago

I have to use git stash, otherwise GitHub CI complains about the untracked file links-main.txt (not sure why it works like this): $ git checkout my_feature_branch error: The following untracked working tree files would be overwritten by checkout: links-main.txt Please move or remove them before you switch branches. Aborting

That's weird. When I remove the Stash untracked files and Apply stashed changes, it still works for me.

SRv6d commented 1 month ago

Could this be implemented in the action or lychee itself?

mre commented 1 month ago

I guess it could be. Question is if we should. It would be a maintenance burden. Is copy-pasting the GitHub workflow an issue?

f-hollow commented 1 month ago

@mre I still want to implement checking internal links affected by removed files as mentioned above. Once I finish, I will create a pull request to add this information to the docs as you suggested.

It might take a couple of weeks though, since I am a bit busy recently.

mre commented 1 month ago

Cool. Thanks a lot!

mre commented 1 month ago

@f-hollow, any updates? Would love to have this.

f-hollow commented 2 weeks ago

@f-hollow, any updates? Would love to have this.

Thank you very much for waiting.

This workflow has been running in my repo all this time and there are two instances of strange behavior which I would like to inspect. Then I should be able to create a PR, most likely no later than next week.

Just noticed that a PR has already been created. Thank you @sekyondaMeta!

In this case, I will see if any improvements are needed :)

mre commented 1 week ago

I guess that's a misunderstanding. The referenced pull request targets pytorch/tutorials, not the lychee docs. I went ahead and created a recipe in our documentation here. The original commit is here.

Please double-check if I made a mistake and feel free to update the docs over there. Apart from that, it looks like we're done here. Thanks for the great idea to use --dump to get a diff to the base branch.