rhiever / reddit-analysis

A Python script that parses post titles, self-texts, and comments on reddit and makes word clouds out of the word frequencies.
286 stars 63 forks source link

Encoding error #40

Closed rhiever closed 11 years ago

rhiever commented 11 years ago

Seems there are some characters in /r/Sweden that crash the script.

Commented on here: http://www.reddit.com/r/MUWs/comments/19xurd/requestrsweden/c8scgyt

bboe commented 11 years ago

URLs can also have unicode characters in them so those need to be handled.

rhiever commented 11 years ago

Here's the offending line: https://github.com/rhiever/reddit-analysis/blob/master/redditanalysis/__init__.py#L265

Is it just as simple as this?

.format(submission.permalink.encode("UTF-8")))
bboe commented 11 years ago

Yes, it should be that simple.