raviqqe / muffet

Fast website link checker in Go
MIT License
2.47k stars 95 forks source link

Systematic timeout for microsoft.com #306

Closed aslafy-z closed 1 year ago

aslafy-z commented 1 year ago

I'm unable to scan any microsoft.com pages, however, I can curl them.

$ muffet https://www.microsoft.com --verbose --one-page-only --max-response-body-size=1000000000000000 --timeout 60
failed to fetch root page: timeout
raviqqe commented 1 year ago

You need to masquerade the user-agent header in some way (e.g. --header 'user-agent: Curl'.)

> muffet --header 'user-agent: Curl' --one-page-only -v https://microsoft.com
https://www.microsoft.com/
        200     http://www.microsoft.com/en/us/default.aspx?redir=true