rennat / pynliner

Python CSS-to-inline-styles conversion tool for HTML using BeautifulSoup and cssutils
http://pythonhosted.org/pynliner/
181 stars 93 forks source link

fix error with non ascii #11

Closed sserrano44 closed 11 years ago

sserrano44 commented 13 years ago

This patch worked for me

--- a/pynliner/__init__.py
+++ b/pynliner/__init__.py
@@ -208,7 +208,7 @@ class Pynliner(object):

         Returns self.output
         """
-        self.output = unicode(str(self.soup))
+        self.output = unicode(self.soup.renderContents(), 'utf-8')
         return self.output

def fromURL(url, log=None):
rennat commented 12 years ago

This is a minor change but please fork and submit a pull request so that you are properly attributed for your commits :)

peteroconnor commented 12 years ago

While the above fix did get rid of an error at that particular location, I got another error when later processing the html.

(The New Error)

File "/home/www-data/web2py/gluon/tools.py", line 363, in send
  html = html.decode(encoding).encode('utf-8')
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
  return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u7528' in position 293: ordinal not in range(128)

I used the following patch and fixed both errors. I am not an expert in unicode. Just try and error.

211c211
<         self.output = str(self.soup.renderContents())
---
>         self.output = unicode(str(self.soup))
rennat commented 11 years ago

This was related to some odd behavior in the current (at the time) version of BeautifulSoup and newer versions allow us to go directly to unicode.

(the BeautifulSoup version requirements for pynliner have been modified to reflect this in tag 0.5.0)