Closed freetom closed 6 years ago
ok, maybe we can fix it
Furthermore, running the crawler with a non-ascii character(s) (such as 'à' in the search term also provokes a crash..
[17:48:26] INFO::email_crawler - ----------------------------------------
[17:48:26] ERROR::email_crawler - EXCEPTION: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)
Traceback (most recent call last):
File "email_crawler.py", line 217, in <module>
crawl(arg)
File "email_crawler.py", line 57, in crawl
logger.info("Keywords to Google for: %s" % keywords)
File "/usr/lib/python2.7/logging/__init__.py", line 1167, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python2.7/logging/__init__.py", line 1286, in _log
self.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1296, in handle
self.callHandlers(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1336, in callHandlers
hdlr.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 759, in handle
self.emit(record)
File "/home/tomas/python-email-crawler2/ColorStreamHandler.py", line 38, in emit
record.msg = record.msg.encode('utf-8', 'ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)
Fixed
The crawler crashes when a URL with a non-ascii character is encountered (e.g 'ß')
Crash log:
However, it may be that the issues is located in thesqlalchemy
lib but I don't know for sure.