john-hu / untitled

0 stars 0 forks source link

handle error IPv6 URL #104

Closed john-hu closed 2 years ago

john-hu commented 2 years ago
2021-12-29 06:41:44 [peeler.scrapy_utils.spiders.generator_base] ERROR: Parse url, https://www.101cookbooks.com/archives/soup-au-pistou-recipe.html,  3 / 60, error: ValueError('Invalid IPv6 URL')
Traceback (most recent call last):
  File "/home/pi/git/untitled/peeler/scrapy_utils/spiders/generator_base.py", line 42, in parse
    for item in self.yield_results(response):
  File "/home/pi/git/untitled/peeler/general/spiders/general_result.py", line 79, in yield_results
    yield from self.parse_anchor(response)
  File "/home/pi/git/untitled/peeler/general/spiders/general_result.py", line 67, in parse_anchor
    resolved_url = urljoin(response.url, href).strip()
  File "/usr/lib/python3.7/urllib/parse.py", line 511, in urljoin
    urlparse(url, bscheme, allow_fragments)
  File "/usr/lib/python3.7/urllib/parse.py", line 368, in urlparse
    splitresult = urlsplit(url, scheme, allow_fragments)
  File "/usr/lib/python3.7/urllib/parse.py", line 435, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
2021-12-29 06:42:03 [scrapy.exte