soumalyaon6 / spynner

Automatically exported from code.google.com/p/spynner
GNU General Public License v3.0
0 stars 0 forks source link

Unicode output #21

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
browser.html returns unicode, a website I was using generated the error

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 
107471: ordinal not in range(128)

Using django's smart_string fixes the problem, 

    br = spynner.Browser()  
    br.load(url)    br.runjs('javascript:__doPostBack('ctl00$ContentPlaceHolder1$sdEntityDisplay$mygrid','Page$2')')
    from django.utils.encoding import smart_str, smart_unicode
    OutFileHandler = open('htmlout.txt', 'w')
    OutFileHandler.write(smart_str(br.html))
    OutFileHandler.close()

I suggest maybe adding a try /except to the _get_html method for the browser, 
where the string returned depends on whether unicode handles the page well or 
not... 

and/or adding a new method which returns ascii by default

Original issue reported on code.google.com by BoyWonde...@gmail.com on 24 Nov 2010 at 8:25

GoogleCodeExporter commented 8 years ago
yes, it probably makes sense to return ascii, try:

    def _get_html(self):
        return str(self.webframe.toHtml().toAscii())

Original comment by tokland on 24 Nov 2010 at 8:39