Closed hvd closed 10 years ago
@hvd Thanks for reporting this issue. Are you running this example in IPython Notebook or through a Python or IPython interpreter session?
you are welcome @ptwobrussell I encountered this on a Python interpreter session.
Thanks! I thought this might be the case.
I wonder if what is going on here is actually an issue with how an ordinary Python interpreter session handles writing UTF-8 to standard out. I've seen things like this before, and tweaking how Python handles sys.stdout helped:
import sys
import codecs
sys.stdout=codecs.getwriter('utf-8')(sys.stdout)
print extractor.getText() # works as "expected" now?
(I may be mistaken, but I think this setting is preconfigured with IPython/IPython Notebook.)
Given that the exception is a UnicodeEncodeError referencing the ascii codec, I think what could be happening is that there is an implicit coercion to ascii that happens in some circumstances when you are attempting to print Unicode to standard out.
I'm curious if the suggestion I've made here deals with the error you are seeing?
Wanted to check back with you one last time before losing this issue...
Hey Sorry Matthew, Let me get back to you in a day or 2. Best Hersh
On Fri, Feb 14, 2014 at 7:00 AM, Matthew A. Russell < notifications@github.com> wrote:
Wanted to check back with you one last time before losing this issue...
Reply to this email directly or view it on GitHubhttps://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/issues/131#issuecomment-35090581 .
Harshvardhan Kelkar
@ptwobrussell Just checked with your fix, that resolves the issue too. Thanks Hersh
Excellent! Thanks so much for confirming.
Invoking Extractor.getText() on Python 2.7 raises an UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 470: ordinal not in range(128)
This is easily fixed by applying a encode before printing it: print extractor.getText().encode('utf-8')