wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

AttributeError: 'HttpUrl' object has no attribute 'proxy_type' #555

Open gr1ev0us opened 9 years ago

gr1ev0us commented 9 years ago

The problem appears when i chek local web-site: linkchecker --ignore-url=^mailto: www.sintez.dev

I've tried to change 'url_data.proxy_type' to 'url_data.proxytype', but it creates a new error in /usr/local/lib/python2.7/dist-packages/requests/sessions.py: AttributeError: 'set' object has no attribute 'setdefault'

Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-http://www.sintezr.dev, started 139842591651584)> self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-http://www.sintezr.dev, started 139842591651584)>> url_data = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data line: check_url(url_data, self.logger) locals: check_url = <function check_url at 0x7f2fabdbbc08> url_data = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> self = <Checker(CheckThread-http://www.sintezr.dev, started 139842591651584)> self.logger = <linkcheck.director.logger.Logger object at 0x7f2fa8f18410> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 52, in check_url line: url_data.check() locals: url_data = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> url_data.check = <bound method HttpUrl.check of <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 424, in check line: self.local_check() locals: self = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> self.local_check = <bound method HttpUrl.local_check of <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 442, in local_check line: self.check_connection() locals: self = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> self.check_connection = <bound method HttpUrl.check_connection of <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 128, in check_connection line: if not self.allows_robots(self.url): locals: self = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> self.allows_robots = <bound method HttpUrl.allows_robots of <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev>> self.url = u'http://www.sintezr.dev', len = 22 File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 66, in allows_robots line: return self.aggregate.robots_txt.allows_url(self) locals: self = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> self.aggregate = <linkcheck.director.aggregator.Aggregate object at 0x7f2fa8f18390> self.aggregate.robots_txt = <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f2fa8f18310> self.aggregate.robots_txt.allows_url = <bound method RobotsTxt.allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f2fa8f18310>> File "/usr/lib/python2.7/dist-packages/linkcheck/cache/robots_txt.py", line 49, in allows_url line: return self._allows_url(url_data, roboturl) locals: self = <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f2fa8f18310> self._allows_url = <bound method RobotsTxt._allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f2fa8f18310>> url_data = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> roboturl = u'http://www.sintezr.dev/robots.txt', len = 33 File "/usr/lib/python2.7/dist-packages/linkcheck/cache/robots_txt.py", line 62, in _allows_url line: kwargs["proxies"] = {url_data.proxy_type, url_data.proxy} locals: kwargs = {'session': <requests.sessions.Session object at 0x7f2fa8f18690>, 'auth': None} url_data = <http link, base_url=u'http://www.sintezr.dev', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.sintezr.dev> url_data.proxy_type = !AttributeError: 'HttpUrl' object has no attribute 'proxy_type' url_data.proxy = '192.168.1.254:3128', len = 18 AttributeError: 'HttpUrl' object has no attribute 'proxy_type' System info: LinkChecker 9.3 Released on: 16.7.2014 Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2

Statistics: Downloaded: 0B. No statistics available since no URLs were checked.

That's it. 0 links in 0 URLs checked. 0 warnings found. 0 errors found. Stopped checking at 2014-10-09 14:16:08+004 (0.06 seconds) Requests: 2.4.1 Qt: 4.8.6 / PyQt: 4.10.4 Modules: Sqlite, Gconf Local time: 2014-10-09 14:16:08+004 sys.argv: ['/usr/bin/linkchecker', 'www.sintezr.dev'] http_proxy = 'http://192.168.1.254:3128/' noproxy = 'localhost,127.0.0.0/8,::1,.dev,192.168._,10.10.20.67' LANGUAGE = 'ru:en' LANG = 'ru_RU.UTF-8' Default locale: ('ru', 'UTF-8')

nbigaouette commented 9 years ago

Same issue here:

$ linkchecker --ignore-url=^mailto:  http://github.com

INFO 2014-11-03 11:01:58,561 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.3              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2014-11-03 11:01:58-004

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a bug report
at https://github.com/wummel/linkchecker/issues
and include the following information:
- the URL or file you are testing
- the system information below

When using the commandline client:
- your commandline arguments and any custom configuration files.
- the output of a debug run with option "-Dall"

Not disclosing some of the information above due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/linkcheck/director/checker.py", line 104, in check_url
    line: self.check_url_data(url_data)
    locals:
      self = <local> <Checker(CheckThread-http://github.com, started 140448599136000)>
      self.check_url_data = <local> <bound method Checker.check_url_data of <Checker(CheckThread-http://github.com, started 140448599136000)>>
      url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
  File "/usr/lib/python2.7/site-packages/linkcheck/director/checker.py", line 120, in check_url_data
    line: check_url(url_data, self.logger)
    locals:
      check_url = <global> <function check_url at 0x7fbcbdd3fed8>
      url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      self = <local> <Checker(CheckThread-http://github.com, started 140448599136000)>
      self.logger = <local> <linkcheck.director.logger.Logger object at 0x7fbcbd5e1310>
  File "/usr/lib/python2.7/site-packages/linkcheck/director/checker.py", line 52, in check_url
    line: url_data.check()
    locals:
      url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      url_data.check = <local> <bound method HttpUrl.check of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
  File "/usr/lib/python2.7/site-packages/linkcheck/checker/urlbase.py", line 424, in check
    line: self.local_check()
    locals:
      self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      self.local_check = <local> <bound method HttpUrl.local_check of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
  File "/usr/lib/python2.7/site-packages/linkcheck/checker/urlbase.py", line 442, in local_check
    line: self.check_connection()
    locals:
      self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      self.check_connection = <local> <bound method HttpUrl.check_connection of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
  File "/usr/lib/python2.7/site-packages/linkcheck/checker/httpurl.py", line 128, in check_connection
    line: if not self.allows_robots(self.url):
    locals:
      self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      self.allows_robots = <local> <bound method HttpUrl.allows_robots of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
      self.url = <local> u'http://github.com', len = 17
  File "/usr/lib/python2.7/site-packages/linkcheck/checker/httpurl.py", line 66, in allows_robots
    line: return self.aggregate.robots_txt.allows_url(self)
    locals:
      self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      self.aggregate = <local> <linkcheck.director.aggregator.Aggregate object at 0x7fbcbd5e1350>
      self.aggregate.robots_txt = <local> <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>
      self.aggregate.robots_txt.allows_url = <local> <bound method RobotsTxt.allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>>
  File "/usr/lib/python2.7/site-packages/linkcheck/cache/robots_txt.py", line 49, in allows_url
    line: return self._allows_url(url_data, roboturl)
    locals:
      self = <local> <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>
      self._allows_url = <local> <bound method RobotsTxt._allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>>
      url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      roboturl = <local> u'http://github.com/robots.txt', len = 28
  File "/usr/lib/python2.7/site-packages/linkcheck/cache/robots_txt.py", line 71, in _allows_url
    line: kwargs["proxies"] = {url_data.proxy_type, url_data.proxy}
    locals:
      kwargs = <local> {'auth': None, 'session': <requests.sessions.Session object at 0x7fbcbd5e1510>}
      url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
      url_data.proxy_type = <local> !AttributeError: 'HttpUrl' object has no attribute 'proxy_type'
      url_data.proxy = <local> '192.1.1.250:8080', len = 16
AttributeError: 'HttpUrl' object has no attribute 'proxy_type'
System info:
LinkChecker 9.3
Released on: 16.7.2014
Python 2.7.8 (default, Sep 24 2014, 18:26:21) 
[GCC 4.9.1 20140903 (prerelease)] on linux2
Requests: 2.4.3
Qt: 4.8.6 / PyQt: 4.11.2
Modules: Sqlite
Local time:
Statistics:
 2014-11-03 11:01:58-004Downloaded: 0B.

No statistics available since no URLs were checked.

sys.argv: That's it. 0 links in 0 URLs['/usr/bin/linkchecker', '--ignore-url=^mailto:', 'http://github.com']
http_proxy  checked. = 'http://192.1.1.250:8080'
LANGUAGE 0 warnings found=.  0 errors found.
''Stopped checking at 2014-11-03 11:01:58-004 (0.05 seconds)

LANG = 'en_CA.UTF-8'
Default locale: ('en', 'UTF-8')

 ******** LinkChecker internal error, over and out ********
WARNING 2014-11-03 11:01:58,617 CheckThread-http://github.com internal error occurred

Running under ArchLinux x86_64.

colwilson commented 9 years ago

Yes same issue I believe

(linkchecker)[col@wave-service-checker linkchecker]$ linkchecker --check-extern http://httpbin.org/get LinkChecker 9.3 Copyright (C) 2000-2014 Bastian Kleineidam LinkChecker comes with ABSOLUTELY NO WARRANTY! This is free software, and you are welcome to redistribute it under certain conditions. Look at the file `LICENSE' within this distribution. Get the newest version at http://wummel.github.io/linkchecker/ Write comments and bugs to https://github.com/wummel/linkchecker/issues Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2014-11-12 12:33:24+000

****** Oops, I did it again. *****

You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:

When using the commandline client:

Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

Traceback (most recent call last): File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-http://httpbin.org/get, started 139639117575936)> self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-http://httpbin.org/get, started 139639117575936)>>

Statistics: Downloaded: 0B. No statistics available since no URLs were checked.

That's it. 0 links in 0 URLs checked. 0 warnings found. 0 errors found. Stopped checking at 2014-11-12 12:33:24+000 (0.04 seconds) url_data = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/director/checker.py", line 120, in check_url_data line: check_url(url_data, self.logger) locals: check_url = <function check_url at 0x15abaa0> url_data = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> self = <Checker(CheckThread-http://httpbin.org/get, started 139639117575936)> self.logger = <linkcheck.director.logger.Logger object at 0x16f2f90> File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/director/checker.py", line 52, in check_url line: url_data.check() locals: url_data = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> url_data.check = <bound method HttpUrl.check of <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get>> File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/checker/urlbase.py", line 424, in check line: self.local_check() locals: self = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> self.local_check = <bound method HttpUrl.local_check of <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get>> File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/checker/urlbase.py", line 442, in local_check line: self.check_connection() locals: self = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> self.check_connection = <bound method HttpUrl.check_connection of <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get>> File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/checker/httpurl.py", line 128, in check_connection line: if not self.allows_robots(self.url): locals: self = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> self.allows_robots = <bound method HttpUrl.allows_robots of <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get>> self.url = u'http://httpbin.org/get', len = 22 File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/checker/httpurl.py", line 66, in allows_robots line: return self.aggregate.robots_txt.allows_url(self) locals: self = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> self.aggregate = <linkcheck.director.aggregator.Aggregate object at 0x16f2f10> self.aggregate.robots_txt = <linkcheck.cache.robots_txt.RobotsTxt object at 0x16f2e50> self.aggregate.robots_txt.allows_url = <bound method RobotsTxt.allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x16f2e50>> File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/cache/robots_txt.py", line 49, in allows_url line: return self._allows_url(url_data, roboturl) locals: self = <linkcheck.cache.robots_txt.RobotsTxt object at 0x16f2e50> self._allows_url = <bound method RobotsTxt._allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x16f2e50>> url_data = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> roboturl = u'http://httpbin.org/robots.txt', len = 29 File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/cache/robots_txt.py", line 62, in _allows_url line: kwargs["proxies"] = {url_data.proxy_type, url_data.proxy} locals: kwargs = {'session': <requests.sessions.Session object at 0x1701b50>, 'auth': None} url_data = <http link, base_url=u'http://httpbin.org/get', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://httpbin.org/get> url_data.proxy_type = !AttributeError: 'HttpUrl' object has no attribute 'proxy_type' url_data.proxy = 'xxx.xxx.xxx.xxx:8080', len = 20 AttributeError: 'HttpUrl' object has no attribute 'proxy_type' System info: LinkChecker 9.3 Released on: 16.7.2014 Python 2.7.6 (default, Oct 31 2014, 13:47:43) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2 Requests: 2.4.3 Modules: Sqlite Local time: 2014-11-12 12:33:24+000 sys.argv: ['/var/www/linkchecker/bin/linkchecker', '--check-extern', 'http://httpbin.org/get'] http_proxy = 'http://xxx.xxx.xxx.xxx:8080' ftp_proxy = 'http://xxx.xxx.xxx.xxx:8080' LANG = 'en_US.UTF-8' Default locale: ('en', 'UTF-8')

\ LinkChecker internal error, over and out ** WARNING 2014-11-12 12:33:24,458 CheckThread-http://httpbin.org/get internal error occurred

colwilson commented 9 years ago

i don't have time to merge this at the minute sorry, but change line 61 of robots_txt.py

    if hasattr(url_data, "proxy") and hasattr(url_data, "proxy_type"):
mgedmin commented 6 years ago

See also commits https://github.com/wummel/linkchecker/commit/52337f82cbc89c93929c16a8dd3eb0df60150300 and https://github.com/wummel/linkchecker/commit/4e56eceb358ae9e9c25833adbc44b761d321b586.