vmanisha / py-webkit-html-manipulator

Automatically exported from code.google.com/p/py-webkit-html-manipulator
0 stars 0 forks source link

Crashes on redirection #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
--------------------------------------
Issue the following command in a terminal:

  ./whm.py -u http://www.wavemaker.com >wm.out

Oddly, not using a redirect works fine, as does piping through 'more'. 
It's only when stdout is redirected the problem occurs.

What is the expected output? What do you see instead?
-----------------------------------------------------
Instead of producing an output file, the page loads correctly in the
application window, then freezes:

Traceback (most recent call last):
  File "./whm.py", line 38, in parsePage
    print unicode(self.frame.toHtml())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
position 49203: ordinal not in range(128)

What version of the product are you using? On what operating system?
--------------------------------------------------------------------
Rev r2

Ubuntu 9.10 amd64 en_GB Europe/London

python 2.6.4

Please provide any additional information below.
-----------------------------------------------
Changing line 38 to use the page encoding:

    print unicode(self.frame.toHtml(), 'utf_8')

produced a different traceback:

Traceback (most recent call last):
  File "./whm.py", line 38, in parsePage
    print unicode(self.frame.toHtml(), 'utf_8')
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 49214:
unexpected code byte

Changing the line to ignore errors:

  print unicode(self.frame.toHtml(), 'utf_8', 'ignore')

avoids the issue, and produces an output file as expected.

Original issue reported on code.google.com by relative...@gmail.com on 29 Nov 2009 at 12:39