raviqqe / muffet

Fast website link checker in Go
MIT License
2.52k stars 99 forks source link

Unexpected 503 error #184

Open ForestEckhardt opened 3 years ago

ForestEckhardt commented 3 years ago

I am running muffet version 2.4.4

When running the following command: muffet https://www.php.net I get the following return: failed to fetch root page: 503

This is unexpected because when I run a curl command on the same link: curl -I https://www.php.net This rendered the following output:

HTTP/2 200
server: myracloud
date: Wed, 20 Oct 2021 19:10:49 GMT
content-type: text/html; charset=utf-8
last-modified: Wed, 20 Oct 2021 19:00:14 GMT
content-language: en
permissions-policy: interest-cohort=()
x-frame-options: SAMEORIGIN
set-cookie: COUNTRY=NA%2C35.188.94.147; expires=Wed, 27-Oct-2021 19:10:49 GMT; Max-Age=604800; path=/; domain=.php.net
set-cookie: LAST_NEWS=1634757049; expires=Thu, 20-Oct-2022 19:10:49 GMT; Max-Age=31536000; path=/; domain=.php.net
link: <https://www.php.net/index>; rel=shorturl
expires: Wed, 20 Oct 2021 19:10:49 GMT
cache-control: max-age=0

I have also tried using the default Go http client to get this URL and it also gets a 200 response. Any tips on why this might be happening with muffet and how to potentially remediate this problem would be much appreciated!

raviqqe commented 2 years ago

Can you try some of the request concurrency options listed in muffet --help? The web backend seems to be overloaded in this case.

ForestEckhardt commented 2 years ago

I tried slamming every flag that I felt would decrease the concurrency and ended up with the following: muffet https://www.php.net --max-connections=1 --max-connections-per-host=1 --rate-limit=1 I still got the following error: failed to fetch root page: 503

Are there any flags that you would recommend me trying? Did I miss a flag?

raviqqe commented 2 years ago

Hmmm. Actually, it seems to be a bug in Muffet. Even when I set exactly the same set of request headers as same as curl's, it fails to get a successful response.

> go run . --header 'user-agent: curl/7.79.1' https://www.php.net                     
GET / HTTP/1.1                                                                        
User-Agent: curl/7.79.1                                                               
Host: www.php.net                                                                     
Accept: */*                                                                           

failed to fetch root page: 503                                                        
exit status 1

This is part of the response:

<i class="icon-icon myra-server"></i> <i class="status-icon myra-ok"></i></div><p class="error-info"><span class="what">Host</span> <span class="status status-working">Working</span></p></div></div></div><div class="row clearfix"><div class="error-desc"><div class="col one-half"><div class="error-desc-text what-happened"><h2>What happened?</h2><p>You ran into a security check to verify the validity of your request.</p></div></div><div class="col one-half"><div class="error-desc-text what-do"><h2>What can I do?</h2><h3>If you are a visitor of this website:</h3><p>You must confirm that you are human.</p><h3>If you are the owner of this website:</h3><p>Please check your security settings.</p></div></div></div></div></body></html>

So it's apparently blocked by security restrictions at https://www.php.net. It's interesting that it can detect it even when the request is exactly the same as the one by curl.