pelican-plugins / deadlinks

Pelican plugin to validate availability of referenced external links
MIT License
16 stars 7 forks source link

[Bug] UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128) #3

Closed Kristinita closed 7 years ago

Kristinita commented 7 years ago

1. Summary

I get stack trace, if I use deadlinks plugin.

2. Settings

My project — https://github.com/Kristinita/KristinitaPelican, Nas-Izu.md file — https://github.com/Kristinita/KristinitaPelican/blob/master/content/Giologica/Nas-Izu.md with Cyrillic symbols.

Part of my pelicanconf.py file:


PLUGIN_PATHS = ['pelican-plugins']
PLUGINS = [
    'pagefixer',
    'pelican_javascript',
    'section_number', 'interlinks', 'deadlinks'
]

DEADLINK_VALIDATION = True

DEADLINK_OPTS = {
    'archive': True,
    'classes': ['custom-class1', 'disabled'],
    'labels': True
}

3. Steps to reproduce

I run command in terminal:

pelican content --debug > DeadlinkDebug.txt 2>&1

See full output on Gist — https://gist.github.com/Kristinita/63c81829c196afd7dc68cbe5e3dba12a.

4. Expected behavior

Not stack trace.

5. Actual behavior

ERROR: Could not process Giologica\Nas-Izu.md
  | UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)
  |___
  | Traceback (most recent call last):
  |   File "c:\python36\lib\site-packages\pelican\generators.py", line 629, in generate_context
  |     context_sender=self)
  |   File "c:\python36\lib\site-packages\pelican\readers.py", line 572, in read_file
  |     context=context)
  |   File "c:\python36\lib\site-packages\pelican\contents.py", line 153, in __init__
  |     signals.content_object_init.send(self)
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in send
  |     for receiver in self.receivers_for(sender)]
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in <listcomp>
  |     for receiver in self.receivers_for(sender)]
  |   File "D:\Kristinita\pelican-plugins\deadlinks\deadlinks.py", line 163, in content_object_init
  |     avail, success, code = get_status_code(url)
  |   File "D:\Kristinita\pelican-plugins\deadlinks\deadlinks.py", line 32, in get_status_code
  |     urlopen(url)
  |   File "c:\python36\lib\urllib\request.py", line 223, in urlopen
  |     return opener.open(url, data, timeout)
  |   File "c:\python36\lib\urllib\request.py", line 526, in open
  |     response = self._open(req, data)
  |   File "c:\python36\lib\urllib\request.py", line 544, in _open
  |     '_open', req)
  |   File "c:\python36\lib\urllib\request.py", line 504, in _call_chain
  |     result = func(*args)
  |   File "c:\python36\lib\urllib\request.py", line 1346, in http_open
  |     return self.do_open(http.client.HTTPConnection, req)
  |   File "c:\python36\lib\urllib\request.py", line 1318, in do_open
  |     encode_chunked=req.has_header('Transfer-encoding'))
  |   File "c:\python36\lib\http\client.py", line 1239, in request
  |     self._send_request(method, url, body, headers, encode_chunked)
  |   File "c:\python36\lib\http\client.py", line 1250, in _send_request
  |     self.putrequest(method, url, **skips)
  |   File "c:\python36\lib\http\client.py", line 1117, in putrequest
  |     self._output(request.encode('ascii'))
  | UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)

6. Environment

Operating system and version: Windows 10 Enterprise LTSB 64-bit EN Python: 3.6.1 Pelican: 3.7.1 BeautifulSoup4: 4.5.3

Thanks.

silentlamb commented 7 years ago

Thank you for your bug reports, I love how detailed they are.

The reason of this stack trace was this link: http://www.wikireality.ru/wiki/Оффтопик not being "percent-encoded" before passing it to urlopen.

Kristinita commented 7 years ago

@silentlamb , maybe this Stack Overflow answer help you.

Thanks.

silentlamb commented 7 years ago

@Kristinita I've tested the fix and everything seemed to work properly, but I got two requests if you don't mind.

  1. Check from your side whether the current master branch resolves the stack trace bug (just a sanity check, if it doesn't work please reopen the issue)
  2. Check whether all or some of the issues from #4 are gone or not (I've changed request handling a bit:, timeouts, SSL errors, etc)
Kristinita commented 7 years ago

@silentlamb, after update I get similar stack trace for all my articles and pages. Example:

ERROR: Could not process Sublime-Text\ValeriyaSpeller.md
  | TypeError: '>=' not supported between instances of 'NoneType' and 'int'
  |___
  | Traceback (most recent call last):
  |   File "c:\python36\lib\site-packages\pelican\generators.py", line 523, in generate_context
  |     context_sender=self)
  |   File "c:\python36\lib\site-packages\pelican\readers.py", line 572, in read_file
  |     context=context)
  |   File "c:\python36\lib\site-packages\pelican\contents.py", line 153, in __init__
  |     signals.content_object_init.send(self)
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in send
  |     for receiver in self.receivers_for(sender)]
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in <listcomp>
  |     for receiver in self.receivers_for(sender)]
  |   File "D:\Kristinita\pelican-plugins\deadlinks\deadlinks.py", line 183, in content_object_init
  |     if code >= 400 and code < 500:
  | TypeError: '>=' not supported between instances of 'NoneType' and 'int'

See full deadlinks output on Gist — https://gist.github.com/Kristinita/a2be9ec597752c9934f4e68cbd67908d.

Thanks.

silentlamb commented 7 years ago

Ok, seems like I haven't tested the time out path properly (which is strange because I though I did). The bug occurs because of stupid typo:

availibility = False, instead of availibility = False

(hopefully...) fixed.

One more thing: I've published two hidden tuning params as these may vary from person to person: timeout duration and flag indicating whether to make each timeout fail (dead links) or just log the fact it was not available (not dead link). By default timeouts are skipped and duration is set to 1000 ms.

Kristinita commented 7 years ago

This problem fix for me, I close issue.

Thanks for a responsible approach to development!