Closed PatrickHeneise closed 2 years ago
It might be related to redirections where we currently don't apply rate limits. Can you send an example URL of the GitHub issue links?
Thanks for having a look at this. They're mostly direct links to GitHub issues:
429 https://github.com/cortexproject/cortex/pull/3897
429 https://github.com/cortexproject/cortex/pull/3904
429 https://github.com/cortexproject/cortex/pull/3905
429 https://github.com/cortexproject/cortex/releases/tag/v1.8.0-rc.1
429 https://github.com/grafana/loki/releases/tag/v2.2.0
429 https://github.com/opstrace/opstrace/issues/412
429 https://github.com/opstrace/opstrace/issues/442
429 https://github.com/opstrace/opstrace/issues/465
429 https://github.com/opstrace/opstrace/issues/483
429 https://github.com/opstrace/opstrace/pull/397
429 https://github.com/opstrace/opstrace/pull/413
429 https://github.com/opstrace/opstrace/pull/441
429 https://github.com/opstrace/opstrace/pull/453
429 https://github.com/opstrace/opstrace/pull/472
429 https://github.com/opstrace/opstrace/pull/482
429 https://github.com/opstrace/opstrace/pull/487
I'm trying -max-connections-per-host=2
but then I'm getting a lot of no free connections available to host
errors. The most reliable config I got so far is:
muffet https://... --rate-limit=5 --skip-tls-verification --buffer-size=8192 --exclude="gstatic.com|linkedin.com...
but even then I get 429s from all the GitHub links we have in our docs and articles.
Another option would be to ignore http statuses, so instead of throwing an error on 429, maybe just a warning and let it pass?
@raviqqe did you get a chance to look into this?
The --max-connections-per-host
option should work better on the main branch (not the latest release of v2.4.5.) Can you test it?
I'm not too familiar with Go, how do I install from the main branch?
Like this:
git clone https://github.com/raviqqe/muffet
GO111MODULE=on go build
./muffet https://foo.com
Same, either getting 429
or no free connections available to host
(tried with --max-connections-per-host
2 and 5)
Can you give me an example URL to test? I can't reproduce the problem with my websites with GitHub URLs.
muffet https://opstrace.com --max-connections-per-host=10 --rate-limit=5 --exclude="gstatic.com|linkedin.com|googletagmanager.com"
We're using the GitHub Action to check for broken links on a regular basis. There are some pages with lots of GitHub links, that's when GitHub starts to throw 429s.
I don't see any 429 errors from GitHub anymore with --rate-limit 1
. Can you try it in your environment? --rate-limit 5
seems to be still too high for links at github.com
.
> go run . --ignore-fragments --max-connections-per-host 10 --rate-limit 1 --exclude "gstatic.com|linkedin.com|googletagmanager.com" https://opstrace.com
https://opstrace.com/cdn-cgi/l/email-protection
403 https://support.cloudflare.com/hc/en-us/articles/200170016-What-is-Email-Address-Obfuscation-
403 https://www.cloudflare.com/sign-up?utm_source=email_protection
https://opstrace.com/terms-of-service
404 https://go.opstrace.com/dpa
429 https://stripe.com/legal
https://opstrace.com/blog/collecting-metrics-from-cockroachdb
dial tcp4 127.0.0.1:8080: connect: connection refused http://localhost:8080/#/metrics/overview/cluster
https://opstrace.com/docs/quickstart
lookup $opstrace_name.opstrace.io: no such host https://$OPSTRACE_NAME.opstrace.io/login
lookup prod.$opstrace_name.opstrace.io: no such host https://prod.$OPSTRACE_NAME.opstrace.io/grafana/explore?orgId=1&left=%5B%22now-30m%22,%22now%22,%22metrics%22,%7B%7D%5D
lookup staging.$opstrace_name.opstrace.io: no such host https://staging.$OPSTRACE_NAME.opstrace.io/grafana/explore?orgId=1&left=%5B%22now-30m%22,%22now%22,%22metrics%22,%7B%7D%5D
https://opstrace.com/blog/introducing-the-open-source-distribution
404 https://kinvolk.io/flatcar-container-linux
https://opstrace.com/docs/guides/contributor/writing-docs
400 https://www.grammarly.com
https://opstrace.com/media
404 https://dok.community/dokc-day-schedule/
https://opstrace.com/privacy-gdpr-supplement
404 https://go.opstrace.com/dpa
https://opstrace.com/blog/nextjs-on-cloudflare
403 https://developers.cloudflare.com/images/resizing-with-workers
403 https://developers.cloudflare.com/pages/platform/github-integration
403 https://developers.cloudflare.com/pages/platform/known-issues
429 https://stripe.com/
503 https://blog.cloudflare.com/cloudflare-pages-ga/
https://opstrace.com/docs/guides/user/configuring-alerts
404 https://opstrace.com/docs/guides/user/#configure-a-contact-point
404 https://opstrace.com/docs/guides/user/#configure-a-notification-policy
404 https://opstrace.com/docs/guides/user/#configure-an-alerting-rule
404 https://opstrace.com/docs/guides/user/#using-the-http-api-to-configure-alerts
https://opstrace.com/blog/week-12-update
404 https://github.com/opstrace/opstrace/tree/main/test/test-remote/containers/looker
exit status 1
Ah actually, I found a bug where muffet
doesn't handle cross origin redirects properly. I'm gonna fix that and come back.
This should be fixed on the main
branch now. Let me know if you still see similar errors.
Note that you might still rarely see no free connections available to host
errors because Muffet doesn't have full control over those connections. If you run into the errors often, please open another issue. At least, we should be able to find some workaround.
After the fix, I can consistently run Muffet with the following options at https://opstrace.com
. The User-Agent
header seems to be required for stripe.com
pages which returns 429 errors when it's not set.
> go run . --ignore-fragments --max-connections-per-host 10 --rate-limit 1 --exclude "gstatic.com|linkedin.com|googletagmanager.com" --header 'User-Agent: muffet' --buffer-size 10000 https://opstrace.com
https://opstrace.com/docs/quickstart
lookup $opstrace_name.opstrace.io: no such host https://$OPSTRACE_NAME.opstrace.io/login
lookup prod.$opstrace_name.opstrace.io: no such host https://prod.$OPSTRACE_NAME.opstrace.io/grafana/explore?orgId=1&left=%5B%22now-30m%22,%22now%22,%22metrics%22,%7B%7D%5D
lookup staging.$opstrace_name.opstrace.io: no such host https://staging.$OPSTRACE_NAME.opstrace.io/grafana/explore?orgId=1&left=%5B%22now-30m%22,%22now%22,%22metrics%22,%7B%7D%5D
https://opstrace.com/docs/references/configuration
lookup aws.amazon.com: no such host https://aws.amazon.com/blogs/containers/introducing-the-new-amazon-eks-console
https://opstrace.com/blog/collecting-metrics-from-cockroachdb
dial tcp4 127.0.0.1:8080: connect: connection refused http://localhost:8080/#/metrics/overview/cluster
https://opstrace.com/blog/introducing-the-open-source-distribution
404 https://kinvolk.io/flatcar-container-linux
https://opstrace.com/media
404 https://dok.community/dokc-day-schedule/
https://opstrace.com/terms-of-service
404 (following redirect https://opstrace.com/Opstrace%20-%20Data%20Processing%20Addendum%203.30.2021.pdf) https://go.opstrace.com/dpa
https://opstrace.com/blog/nextjs-on-cloudflare
503 https://developers.cloudflare.com/images/resizing-with-workers
503 https://developers.cloudflare.com/pages/platform/github-integration
503 https://developers.cloudflare.com/pages/platform/known-issues
https://opstrace.com/cdn-cgi/l/email-protection
403 https://support.cloudflare.com/hc/en-us/articles/200170016-What-is-Email-Address-Obfuscation-
403 https://support.cloudflare.com/hc/en-us/categories/200275218-Getting-Started
403 (following redirect https://dash.cloudflare.com/sign-up?utm_source=email_protection) https://www.cloudflare.com/sign-up?utm_source=email_protection
https://opstrace.com/docs/guides/user/configuring-alerts
404 https://opstrace.com/docs/guides/user/#configure-a-contact-point
404 https://opstrace.com/docs/guides/user/#configure-a-notification-policy
404 https://opstrace.com/docs/guides/user/#configure-an-alerting-rule
404 https://opstrace.com/docs/guides/user/#using-the-http-api-to-configure-alerts
https://opstrace.com/privacy-gdpr-supplement
404 (following redirect https://opstrace.com/Opstrace%20-%20Data%20Processing%20Addendum%203.30.2021.pdf) https://go.opstrace.com/dpa
https://opstrace.com/blog/week-12-update
404 https://github.com/opstrace/opstrace/tree/main/test/test-remote/containers/looker
exit status 1
Awesome, thanks a lot!
We have a bunch of GitHub issue links on a site, and even with
--max-connections=10 --buffer-size=8192 --color=always --rate-limit=2
we're running in a lot of 429 errors. Any suggestion on how to avoid this?