john-hu / untitled

0 stars 0 forks source link

schema org parser error #87

Closed john-hu closed 2 years ago

john-hu commented 2 years ago

ERROR: Parse url, https://www.101cookbooks.com/archives/blackberry-saffron-honey-recipe.html ERROR: Spider error processing <GET https://www.101cookbooks.com/using-your-underutilized-steamer/> (referer: None)

  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/referer.py", line 342, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 40, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/pi/git/untitled/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/pi/git/untitled/peeler/scrapy_utils/spiders/generator_base.py", line 65, in parse
    raise ex
  File "/home/pi/git/untitled/peeler/scrapy_utils/spiders/generator_base.py", line 35, in parse
    for item in self.yield_results(response):
  File "/home/pi/git/untitled/peeler/general/spiders/general_result.py", line 77, in yield_results
    yield from self.parse_recipe(response, recipe_language, site_name)
  File "/home/pi/git/untitled/peeler/general/spiders/general_result.py", line 41, in parse_recipe
    recipe = find_json_by_schema_org_type(response.css(self.json_css_path).getall(), 'Recipe')
  File "/home/pi/git/untitled/peeler/utils/schema_org.py", line 164, in find_json_by_schema_org_type
    data = json.loads(json_text.replace('\n', ' '))
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 411 (char 410)