Always return unicode strings from pdftoxml.

sensiblecodeio / scraperwiki-python

ScraperWiki Python library for scraping and saving data

https://scraperwiki.com

BSD 2-Clause "Simplified" License

159 stars 69 forks source link

Always return unicode strings from pdftoxml. #78

Closed petterreinholdtsen closed 9 years ago

petterreinholdtsen commented 9 years ago

The pdftoxml method on the old scraperwiki site returned unicode strings. Change this version to do the same by interpreting the byte stream from pdftohtml as UTF-8.

Fixes issue #38