wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.43k stars 234 forks source link

sslverify=0 does not work with redirects to https #489

Closed polof closed 10 years ago

polof commented 10 years ago

Hi,

I'm trying to disable SSL/TLS certificate validation using sslverify=0 in the config file. It works for https links, but not when the link redirects to an https address. Here is my testcase:

<a href="http://dis.dsv.su.se/Login/"></a>

My linkcheckerrc:

[checking]
sslverify=0

Also, I think I have the same problem when setting sslverify to a CA bundle.

Here's the command output:

olpe4718@mimas:~/linkchecker$ linkchecker -f linkcheckerrc --check-extern -Dall ssl_testcase2.html 
DEBUG 2014-03-21 17:36:24,819 MainThread Python 2.7.3 (default, Mar 14 2014, 11:57:14) 
[GCC 4.7.2] on linux2
DEBUG 2014-03-21 17:36:24,832 MainThread reading configuration from ['linkcheckerrc']
DEBUG 2014-03-21 17:36:24,874 MainThread configuration: [('aborttimeout', 300),
 ('allowedschemes', []),
 ('authentication', []),
 ('blacklist', {}),
 ('checkextern', True),
 ('cookiefile', None),
 ('csv', {}),
 ('debugmemory', False),
 ('dot', {}),
 ('enabledplugins', []),
 ('externlinks', []),
 ('fileoutput', []),
 ('gml', {}),
 ('gxml', {}),
 ('html', {}),
 ('ignorewarnings', []),
 ('internlinks', []),
 ('localwebroot', None),
 ('logger', 'TextLogger'),
 ('loginextrafields', {}),
 ('loginpasswordfield', 'password'),
 ('loginurl', None),
 ('loginuserfield', 'login'),
 ('maxfilesizedownload', 5242880),
 ('maxfilesizeparse', 1048576),
 ('maxhttpredirects', 10),
 ('maxnumurls', None),
 ('maxrequestspersecond', 10),
 ('maxrunseconds', None),
 ('nntpserver', None),
 ('none', {}),
 ('output', 'text'),
 ('pluginfolders', []),
 ('proxy', {}),
 ('quiet', False),
 ('recursionlevel', -1),
 ('sitemap', {}),
 ('sql', {}),
 ('sslverify', False),
 ('status', True),
 ('status_wait_seconds', 5),
 ('text', {}),
 ('threads', 10),
 ('timeout', 60),
 ('trace', False),
 ('useragent',
  u'Mozilla/5.0 (compatible; LinkChecker/9.1; +http://wummel.github.io/linkchecker/)'),
 ('verbose', False),
 ('warnings', True),
 ('xml', {})]
DEBUG 2014-03-21 17:36:24,882 MainThread FileUrl handles url ssl_testcase2.html
DEBUG 2014-03-21 17:36:24,882 MainThread checking syntax
DEBUG 2014-03-21 17:36:24,888 MainThread Add intern pattern u'file\\:\\/\\/\\/a\\/oberon\\-home0\\/dsv\\/pelle\\/linkchecker\\/'
DEBUG 2014-03-21 17:36:24,888 MainThread Link pattern u'file\\:\\/\\/\\/a\\/oberon\\-home0\\/dsv\\/pelle\\/linkchecker\\/' strict=False
DEBUG 2014-03-21 17:36:24,889 MainThread queueing file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html
LinkChecker 9.1              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2014-03-21 17:36:24+002
DEBUG 2014-03-21 17:36:25,017 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html Checking file link
base_url=u'file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html'
parent_url=None
base_ref=None
recursion_level=0
url_connection=None
line=0
column=0
name=u'ssl_testcase2.html'
anchor=u''
cache_url=file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html
DEBUG 2014-03-21 17:36:25,017 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html checking connection
DEBUG 2014-03-21 17:36:25,035 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html checking recursion of u'file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html' ...
DEBUG 2014-03-21 17:36:25,035 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html ... yes, recursion.
DEBUG 2014-03-21 17:36:25,035 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html link finder
DEBUG 2014-03-21 17:36:25,036 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html Get content of u'file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html'
DEBUG 2014-03-21 17:36:25,036 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html LinkFinder tag a attrs {u'href': u'http://dis.dsv.su.se/Login/'}
DEBUG 2014-03-21 17:36:25,037 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html line 1 col 39 old line 1 old col 1
DEBUG 2014-03-21 17:36:25,037 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html LinkParser found link u'a' u'href' u'http://dis.dsv.su.se/Login/' u'' u''
DEBUG 2014-03-21 17:36:25,039 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html HttpUrl handles url http://dis.dsv.su.se/Login/
DEBUG 2014-03-21 17:36:25,039 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html checking syntax
DEBUG 2014-03-21 17:36:25,044 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html queueing http://dis.dsv.su.se/Login/
DEBUG 2014-03-21 17:36:25,044 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html LinkFinder finished tag a
DEBUG 2014-03-21 17:36:25,045 CheckThread-file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html task_done file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html
DEBUG 2014-03-21 17:36:25,045 CheckThread-http://dis.dsv.su.se/Login/ Checking http link
base_url=u'http://dis.dsv.su.se/Login/'
parent_url=u'file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html'
base_ref=None
recursion_level=1
url_connection=None
line=1
column=1
name=u''
anchor=u''
cache_url=http://dis.dsv.su.se/Login/
DEBUG 2014-03-21 17:36:25,046 CheckThread-http://dis.dsv.su.se/Login/ checking connection
DEBUG 2014-03-21 17:36:25,402 CheckThread-http://dis.dsv.su.se/Login/ u'http://dis.dsv.su.se/robots.txt' allow all (HTTP error)
DEBUG 2014-03-21 17:36:25,402 CheckThread-http://dis.dsv.su.se/Login/ u'http://dis.dsv.su.se/robots.txt' check allowance for:
  user agent: 'Mozilla/5.0 (compatible; LinkChecker/9.1; +http://wummel.github.io/linkchecker/)'
  url: u'http://dis.dsv.su.se/Login/' ...
DEBUG 2014-03-21 17:36:25,402 CheckThread-http://dis.dsv.su.se/Login/  ... allow all.
DEBUG 2014-03-21 17:36:25,402 CheckThread-http://dis.dsv.su.se/Login/ Prepare request with {'url': u'http://dis.dsv.su.se/Login/', 'headers': {'DNT': '1', 'User-Agent': u'Mozilla/5.0 (compatible; LinkChecker/9.1; +http://wummel.github.io/linkchecker/)'}, 'method': 'GET'}
DEBUG 2014-03-21 17:36:25,403 CheckThread-http://dis.dsv.su.se/Login/ Send request with {'verify': False, 'allow_redirects': False, 'stream': True, 'timeout': 60}
DEBUG 2014-03-21 17:36:25,409 CheckThread-http://dis.dsv.su.se/Login/ follow all redirections
DEBUG 2014-03-21 17:36:25,490 CheckThread-http://dis.dsv.su.se/Login/ Error in http://dis.dsv.su.se/Login/: <class 'requests.exceptions.SSLError'> [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
DEBUG 2014-03-21 17:36:25,501 CheckThread-http://dis.dsv.su.se/Login/ Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/LinkChecker-9.1-py2.7-linux-i686.egg/linkcheck/checker/urlbase.py", line 439, in local_check
    self.check_connection()
  File "/usr/local/lib/python2.7/dist-packages/LinkChecker-9.1-py2.7-linux-i686.egg/linkcheck/checker/httpurl.py", line 133, in check_connection
    self.follow_redirections(request)
  File "/usr/local/lib/python2.7/dist-packages/LinkChecker-9.1-py2.7-linux-i686.egg/linkcheck/checker/httpurl.py", line 218, in follow_redirections
    for response in self.session.resolve_redirects(self.url_connection, request, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 168, in resolve_redirects
    allow_redirects=False,
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 385, in send
    raise SSLError(e)
SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

URL        `http://dis.dsv.su.se/Login/'
Parent URL file:///a/oberon-home0/dsv/pelle/linkchecker/ssl_testcase2.html, line 1, col 1
Real URL   http://dis.dsv.su.se/Login/
Check time 0.463 seconds
Result     Error: SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
DEBUG 2014-03-21 17:36:25,509 CheckThread-http://dis.dsv.su.se/Login/ task_done http://dis.dsv.su.se/Login/

Statistics:
Downloaded: 43B.
Content types: 0 image, 1 text, 0 video, 0 audio, 0 application, 0 mail and 1 other.
URL lengths: min=27, max=63, avg=45.

That's it. 2 links in 2 URLs checked. 0 warnings found. 1 error found.
Stopped checking at 2014-03-21 17:36:25+002 (0.66 seconds)
wummel commented 10 years ago

We believe that the issue you reported is fixed in the source repository of linkchecker which can be found under: https://github.com/wummel/linkchecker

Changelog entry:

Thank you for reporting the issue. It is now marked as fixed. If you believe that the issue is not fixed appropriately just add a comment to this issue.

wummel commented 10 years ago

A new version 9.1 of linkchecker has been released on 30.3.2014. Therefore this bug will be closed. If you think this issue is not solved, please open a new issue.

amplexus commented 8 years ago

Looks like this is still happening. I'm using LinkChecker 9.3 on Debian 8.5.

My config file contains:

[checking]
sslverify=0

Here's a screenshot:

craig@jumphost:~$ linkchecker http://staging/ --config=./linkchecker.staging.rc 
INFO 2016-08-07 11:04:53,736 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.3              Copyright (C) 2000-2014 Bastian Kleineidam

Start checking at 2016-08-07 11:04:58+011
10 threads active,   105 links queued,    5 links in   3 URLs checked, runtime 1 seconds

URL        `https://staging/index.php?route=account/account'
Name       ` My Account '
Parent URL http://staging/, line 78, col 30
Base       http://staging/
Real URL   https://staging/index.php?route=account/account
Check time 3.578 seconds
Result     Error: SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)
10 threads active,   271 links queued,   55 links in  17 URLs checked, runtime 6 seconds
/usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:732: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
InsecureRequestWarning)
URL        `https://staging/index.php?route=account/account'
Name       `My Account'
Parent URL https://staging/index.php?route=account/register, line 803, col 15
Base       https://staging/
Real URL   https://staging/index.php?route=account/account
Check time 3.578 seconds
Result     Error: SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)

URL        `https://staging/index.php?route=account/account'
Name       `My Account'
Parent URL https://staging/index.php?route=account/register, line 592, col 5
Base       https://staging/
Real URL   https://staging/index.php?route=account/account
Check time 3.578 seconds
Result     Error: SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)

Cheers, Craig

amplexus commented 8 years ago

Hmmm. Seems the above problem occurs with linkchecker 9.3 on my debian 8 VM but behaves correctly with linkchecker 9.3 on my ubuntu 16.04 VM...

amplexus commented 8 years ago

Aaaand apt-get update + upgrade fixed the problem. The python-urllib3 and python-requests packages - which are dependencies according to apt-cache showpkg linkchecker - were upgraded and the problem went away...