wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

BadStatusLine errors #563

Closed mdlinville closed 8 years ago

mdlinville commented 9 years ago

I am occasionally getting BadStatusLine errors, but when I copy and paste the same URL into my browser, the URL works fine. I think it is a "too many connections" sort of thing, and I have tweaked my timeout and number of threads settings but haven't gotten it quite right yet. I also know it's not actually the URL because each run, I get different URLs with this error.

What I really would like is if these errors could put the URL back in the queue and try it again at the end of the run.

Check time 2.613 seconds
Result     Error: ConnectionError: ('Connection aborted.', BadStatusLine("''",))

Here are all my arguments (with a placeholder URL):

./linkchecker --check-extern -a -t 5 --pause 10 --ignore-url favicon.ico --ignore-url static/images http://example.com
zifeishan commented 9 years ago

+1 for this issue.

jurgenhaas commented 9 years ago

I'm having the same issue with ever changing URLs, sometimes that's about html files and sometimes I've seen images. But during one thread it seems to be consistent - only if I call linkchecker again do I get different URLs that are responding with a BadStatusLine.

When I do a curl -I -i on one of those domains I'm getting the headers and can't see anything being wrong. What else does a BadStatusLine refer to?

strowi commented 9 years ago

just to confirm, got this error too with "LinkChecker 9.3 released 16.7.2014"

jesper commented 9 years ago

Same issue here.

NickWoodhams commented 8 years ago

Seeing the same error as well.

URL `/static/css/styles.css' Parent URL https://ecoventsystems.com/press-releases/ecovent-named-automation-device-of-the-year, line 42, col 5 Real URL https://ecoventsystems.com/static/css/styles.css Check time 3.573 seconds Result Error: ConnectionError: ('Connection aborted.', BadStatusLine('\xf6\xa2|\xd7\xc0\xdf\x15\xe9\x91"\xb7\xfe\xa6a\xbc\x10\x02\xa9\x14\xe7Z\xbay\xf9\xd5\x13\xbc\xfbq3\x83\xc3~\xf4\xe4,u\xf6\x03\xef\x03/\x02\x1f\x02\x81x\xc1\x89j\xc2w9\xd2\x1b\x851\xc2...

ghost commented 8 years ago

Same issue here.

jpriebe commented 8 years ago

Me, too. :-(

shyaminayesh commented 8 years ago

Hi @jpriebe, need to contact you ... please email me shyaminsaf {at} gmail ... thank you very much ... sorry for commenting here all :/

jesper commented 8 years ago

For anyone also encountering this issue: I decided to hack together a (free) service with a working link checker (among other things) here: https://monkeytest.it

You can trigger it via a script in Jenkins/etc, Slack, or the website itself.

(Apologies if this counts as spam - by no means intended.)

dothebart commented 8 years ago

@jesper I'd rather like to find a link to your clone on github with the fixes in the spirit of open source!

karlkfi commented 7 years ago

Why was this closed? Was the issue fixed?

mdlinville commented 7 years ago

I'm not sure who closed it, but I'm no longer in a position to test potential fixes for it.

karlkfi commented 7 years ago

The links I got the reported error for were for external:

It doesn't seem easily reproducible, so I'm not sure how to test fixes either.

jmewes commented 7 years ago

Maybe you can reproduce the issue on this website: http://docs.experimental-software.com/

After installing linkchecker 9.3 via apt-get on Ubuntu 16.04 (x86) I ran this command:

$ linkchecker http://docs.experimental-software.com

The result is that there is a connection error for all the sub-pages:

URL             `SoftwareEngineering/Architecture/ReactiveManifesto.html'
Name            `ReactiveManifesto'
Vater URL       http://docs.experimental-software.com, Zeile 296, Spalte 5
Tats. URL       http://docs.experimental-software.com/SoftwareEngineering/Architecture/ReactiveManifesto.html
Prüfzeit        2.978 Sekunden
Ergebnis        Fehler: ConnectionError: ('Connection aborted.', BadStatusLine("''",))
oschonrock commented 5 years ago

This problem apparently goes very deep into python standard lib "request", see here: https://github.com/requests/requests/issues/2364

I found that if you can control the server which you are linkchecking (often the case), that if you turn off HTTP/1.1 keepalive that the problem is massviely (!!) reduced:

This was my .htaccess entry

BrowserMatch "LinkChecker" \
         nokeepalive