lycheeverse / lychee

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!
https://lychee.cli.rs
Apache License 2.0
2.22k stars 134 forks source link

Preserve original URLs when `--remap` is used #1493

Open oponomarov-tu opened 2 months ago

oponomarov-tu commented 2 months ago

I'm using the --remap feature to validate links pointing to private GitHub repositories containing documentation.

$ lychee . --remap 'https://github.com/<my org>/(?P<repo>[^/]+)/(?:blob|tree)/(?P<revision>[^/]+)/(?P<path>[^#]+) https://api.github.com/repos/<my org>/$repo/contents/$path?ref=$revision' --header "Authorization=token $GITHUB_TOKEN" -v

Since there's no straightforward way to verify if a link to a private GitHub repository is broken, I'm mitigating this by using both the --github-token argument and setting the --header "Authorization=token $GITHUB_TOKEN".

With this setup, the --remap feature substitutes links like:

[`sre-design-docs/use-conventional-commit-messages.md`](https://github.com/<my org>/<my private repo>/blob/master/docs/design/docs/non-existint-document.md)

With their API equivalent:

✗ [404] https://api.github.com/repos/<my org>/<my private repo>/contents/docs/design/docs/non-existint-document.md?ref=master | Failed: Network error: Not Found

This approach works as expected, but I’m wondering if there's a way to convert the output link back to its original Markdown format. The goal is to prevent confusion when users search for the broken link in the document. Ideally, I’d like this validation to occur "in the background" to maintain a seamless user experience.

Thank you for considering this feature request. I hope others find this workaround helpful as well.

mre commented 2 months ago

Thank you for your detailed explanation of how you're using lychee's --remap feature to validate links pointing to private GitHub repositories. It's always great to see users coming up with creative solutions to meet their specific needs!

Your suggestion to show the remapped URLs in the output is an excellent idea. This would indeed help prevent confusion when users search for broken links in their documents.

I'm wondering if we should show the original URL as well as the remapped URL, though. We could modify lychee to display both URLs using the following format:

[STATUS] ORIGINAL_URL -> REMAPPED_URL | Additional info

For example, the output would look like this:

[404] https://github.com/<my org>/<my private repo>/blob/master/docs/design/docs/non-existint-document.md -> https://api.github.com/repos/<my org>/<my private repo>/contents/docs/design/docs/non-existint-document.md?ref=master | Failed: Network error: Not Found

This format shows the connection between the original link and its remapped version. I wonder if it's too verbose, though? Alternatively, we could show the original URL only in verbose mode (-v).

We support multiple output formats (--format), so that would require some planning, though.