Closed mdlinville closed 8 years ago
+1 for this issue.
I'm having the same issue with ever changing URLs, sometimes that's about html files and sometimes I've seen images. But during one thread it seems to be consistent - only if I call linkchecker again do I get different URLs that are responding with a BadStatusLine.
When I do a curl -I -i on one of those domains I'm getting the headers and can't see anything being wrong. What else does a BadStatusLine refer to?
just to confirm, got this error too with "LinkChecker 9.3 released 16.7.2014"
Same issue here.
Seeing the same error as well.
URL `/static/css/styles.css'
Parent URL https://ecoventsystems.com/press-releases/ecovent-named-automation-device-of-the-year, line 42, col 5
Real URL https://ecoventsystems.com/static/css/styles.css
Check time 3.573 seconds
Result Error: ConnectionError: ('Connection aborted.', BadStatusLine('\xf6\xa2|\xd7\xc0\xdf\x15\xe9\x91"\xb7\xfe\xa6a\xbc\x10\x02\xa9\x14\xe7Z\xbay\xf9\xd5\x13\xbc\xfbq3\x83\xc3~\xf4\xe4,u\xf6\x03\xef\x03/\x02\x1f\x02\x81x\xc1\x89j\xc2w9\xd2\x1b\x851\xc2...
Same issue here.
Me, too. :-(
Hi @jpriebe, need to contact you ... please email me shyaminsaf {at} gmail ... thank you very much ... sorry for commenting here all :/
For anyone also encountering this issue: I decided to hack together a (free) service with a working link checker (among other things) here: https://monkeytest.it
You can trigger it via a script in Jenkins/etc, Slack, or the website itself.
(Apologies if this counts as spam - by no means intended.)
@jesper I'd rather like to find a link to your clone on github with the fixes in the spirit of open source!
Why was this closed? Was the issue fixed?
I'm not sure who closed it, but I'm no longer in a position to test potential fixes for it.
The links I got the reported error for were for external:
It doesn't seem easily reproducible, so I'm not sure how to test fixes either.
Maybe you can reproduce the issue on this website: http://docs.experimental-software.com/
After installing linkchecker
9.3 via apt-get
on Ubuntu 16.04 (x86) I ran this command:
$ linkchecker http://docs.experimental-software.com
The result is that there is a connection error for all the sub-pages:
URL `SoftwareEngineering/Architecture/ReactiveManifesto.html'
Name `ReactiveManifesto'
Vater URL http://docs.experimental-software.com, Zeile 296, Spalte 5
Tats. URL http://docs.experimental-software.com/SoftwareEngineering/Architecture/ReactiveManifesto.html
Prüfzeit 2.978 Sekunden
Ergebnis Fehler: ConnectionError: ('Connection aborted.', BadStatusLine("''",))
This problem apparently goes very deep into python standard lib "request", see here: https://github.com/requests/requests/issues/2364
I found that if you can control the server which you are linkchecking (often the case), that if you turn off HTTP/1.1 keepalive that the problem is massviely (!!) reduced:
This was my .htaccess entry
BrowserMatch "LinkChecker" \
nokeepalive
I am occasionally getting BadStatusLine errors, but when I copy and paste the same URL into my browser, the URL works fine. I think it is a "too many connections" sort of thing, and I have tweaked my timeout and number of threads settings but haven't gotten it quite right yet. I also know it's not actually the URL because each run, I get different URLs with this error.
What I really would like is if these errors could put the URL back in the queue and try it again at the end of the run.
Here are all my arguments (with a placeholder URL):