my8100 / logparser

A tool for parsing Scrapy log files periodically and incrementally, extending the HTTP JSON API of Scrapyd.
GNU General Public License v3.0
89 stars 22 forks source link

`parse_crawler_stats` parsing error when stats dict string contains unicode keys #2

Closed rodricios closed 5 years ago

rodricios commented 5 years ago

Hi, as the subject line states, parse_crawler_stats throws an error when calling json.loads on a scraping job's stats dict whenever a key is printed with its unicode double string symbol.

Stacktrace:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/logparser/common.py", line 127, in parse_crawler_stats
    return json.loads(text)
  File "/usr/local/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python2.7/json/decoder.py", line 380, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting property name enclosed in double quotes: line 5 column 2 (char 119)
 {"crawlera/request": 100,
 "crawlera/request/method/GET": 100,
 "crawlera/response": 93,
 "crawlera/response/error": 5,
 u"crawlera/response/error/msgtimeout": 4,
 u"crawlera/response/error/timeout": 1}

(stats dictionary was truncrated)

rodricios commented 5 years ago

I addressed the issue here: https://github.com/my8100/logparser/pull/3

my8100 commented 5 years ago

It's weird that the Unicode signs appear here. Could you show the the screenshot of the complete log, including the content of print(text). https://github.com/my8100/logparser/blob/5e531c25a6258d5ee3eb80e37a13b5482da6482d/logparser/common.py#L126-L131

rodricios commented 5 years ago

Sure! Here's a screenshot of the log and print(text):

log

print(text)

my8100 commented 5 years ago

Thanks for your feedback , the problem has been fixed in https://github.com/my8100/logparser/commit/b79a283ee7135527221e02dba5a90bb2982bee8b

BTW, the reason why Unicode signs appear in your log: https://github.com/scrapy-plugins/scrapy-crawlera/blob/87287b6a8e1b1069f782c722e803ce950946c22d/scrapy_crawlera/middleware.py#L183

        crawlera_error = response.headers.get('X-Crawlera-Error')
        if crawlera_error:
            self.crawler.stats.inc_value('crawlera/response/error')
            self.crawler.stats.inc_value(
                'crawlera/response/error/%s' % crawlera_error.decode('utf8'))
rodricios commented 5 years ago

Great, thanks!