Open laupt82 opened 9 years ago
Could you provide the site and the version of Python that you are using (a python --version
would do)?
Hi, thanks for your fast answer. This is not the same page that I tried before, but the same error is obtained when using your library: http://www.ilsole24ore.com/art/mondo/2015-05-11/bombe-nave-turca-largo-libia-131554.shtml?uuid=ABtub1dD The test was performed under Windows 7, python version: 2.7.3
Could you provide a stack trace as well? Because at the moment the best you could do is switch to a libextract (same algorithm as eatiht) + requests approach. My guess is that this encoding issue is mostly due to the (hacky) handling of HTTP requests in eatiht.
Ok, thanks, I will try libextract.... The error traceback:
Traceback (most recent call last):
File "
Hi I found your library really interesting. I need to obtain the article content from web pages that may be written in different languages, mostly English and Italian. Unfortunately when I tried to analyze Italian pages, I have encoding problems: "UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 4: character maps to"