lycheeverse / lychee

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!
https://lychee.cli.rs
Apache License 2.0
2.19k stars 133 forks source link

Too many Network errors: Looking for suggestions #634

Closed vipulgupta2048 closed 6 months ago

vipulgupta2048 commented 2 years ago

Hey folks, Have been tinkering with Lychee lately to run on static HTML and check about 17k+ links. From those, I am getting about 400+ links resulting in a Network error. I wanted to open an issue here to ask folks if I might be doing something wrong or if something can be improved to mitigate these.

I have set verbose: true I wish it could show what the actual HTTP error code is in the report that gets generated.

mre commented 2 years ago

17k+ links are quite astonishing. 🤩

Checking the report, I can see a few things:

There might be more issues, but these are the ones I can see by taking a quick look.

You're right that we should add the status code to the Markdown output. This was an oversight on my end. I can add it but I'd be thankful for a PR which brushes up the Markdown output a little bit.

vipulgupta2048 commented 2 years ago

17k+ links are quite astonishing. star_struck

It's all the power of Rust and a great tool by you @mre, I tweaked a lot with the CPU and concurrency count to see how much GitHub would appreciate getting pushed. I can try a little more to respect their rate-limiting, I don't want to DDOS any site. I will make change to accept the 429, I think it's okay to do that.

Thanks for taking a look, appreciate it. I will see about that Markdown output. I will post a renewed report on what improved after then for others to get help.

mre commented 2 years ago

@vipulgupta2048 are you still planning to post the renewed report or can we close this issue? 😅

vipulgupta2048 commented 2 years ago

Thanks for the bump @mre, here you go Report: https://github.com/balena-io/docs/issues/2364 Config: https://github.com/balena-io/docs/blob/master/lychee.toml

Following the suggestion to allow 429 has greatly decreased errors for us as you can see above. Hope this helps!

mre commented 2 years ago

ℹ️ If anyone runs into issues with rate limiting in the future, there is now a troubleshooting guide over at lychee.cli.rs/#/rate-limits.

StevenMaude commented 1 year ago

information_source If anyone runs into issues with rate limiting in the future, there is now a troubleshooting guide over at lychee.cli.rs/#/rate-limits.

~The rate limit troubleshooting guide moved to: https://lychee.cli.rs/#/troubleshooting/rate-limits~

Edit: it moved again; see below.

mre commented 1 year ago

Thanks for mentioning the updated link! ⭐

p2635 commented 9 months ago

information_source If anyone runs into issues with rate limiting in the future, there is now a troubleshooting guide over at lychee.cli.rs/#/rate-limits.

The rate limit troubleshooting guide moved to: https://lychee.cli.rs/#/troubleshooting/rate-limits

Ironically, these links do not take you to the right page. Here is the right link for anyone reading in future: https://lychee.cli.rs/troubleshooting/rate-limits/#_top (at the time of writing my comment).

mre commented 9 months ago

Haha, that is indeed ironic. We switched the docs backend lately and that changed the links. Thanks for the updated URL. 😆

vipulgupta2048 commented 6 months ago

Hey @mre wanted to probably ping before opening this issue again. I have set 429 to be an accepted code on my config: https://github.com/balena-io/docs/blob/11b7527d663e34709a2a6a9725102ef76a5fe732/lychee.toml#L50

I have also followed the troubleshooting guide mentioned above on rate limits, the only thing not applied is retries 0. Yet, I am seeing about 200 links failing due to 429 error code still: https://github.com/balena-io/docs/issues (First 4 issues are reports)

What am I missing here? Maybe I need to update my lychee action to receive a new update that uses the accepted codes array correctly?

mre commented 6 months ago

Reopening as there still seem to be issues. Thanks for the heads up.

vipulgupta2048 commented 6 months ago

It's nice to see I am not alone in this. Would you think having retries 0 might be helpful here? I have had one report that where everything worked, and it showed no false positives (I pinged you on that). I have been trying to replicate that ever since and have checked if my GITHUB_TOKEN is being used properly, too. To clarify, the issue is that 429 codes, when added to the approved list, are still showing up as errors.

mre commented 6 months ago

Oh, I just looked at your pipeline and noticed that you're still running on 0.14.0. There was a bug fix for the accept handling in https://github.com/lycheeverse/lychee/pull/1344. Here are the release notes: https://github.com/lycheeverse/lychee/releases/tag/v0.14.1

Can you update to the latest version? (FYI, we released lychee-action@v1.10.0 yesterday, which is based on lychee 0.15.0.)

vipulgupta2048 commented 6 months ago

Awesome find. I think I was looking at the wrong place for the changelog of the feature (lychee-action). Let me update the action, and report back on Monday how the new report goes. Apologize for the noise.

vipulgupta2048 commented 6 months ago

That did it https://github.com/balena-io/docs/issues/2968 Only valid errors now.

mre commented 6 months ago

Good times. Thanks for the feedback.