scrapy / itemloaders

Library to populate items using XPath and CSS with a convenient API
BSD 3-Clause "New" or "Revised" License
45 stars 16 forks source link

ValueError: XPath error: Unknown return type: re.Pattern in //tr[starts-with(td[1]/text(), "Цена:")]/td[2]/text() #56

Closed tonal closed 2 years ago

tonal commented 2 years ago

Error when add_xpath/add_css in re set re.Pattern

itemloaders==1.0.5

Code:

re_price = re.compile(r'([\d\s]*\d)')
loader = itemloaders.ItemLoader(item=dict(), response=response)
loader.add_xpath(
  'price',
  '//tr[starts-with(td[1]/text(), "Цена:")]/td[2]/text()',
  re=re_price) # < argument re type re.Pattern

Traceback

ERROR: Spider error processing <GET http://www.avangard-voda.ru/catalog/osveschenie/galogenovye/78/1006030/> (referer: http://www.avangard-voda.ru/catalog/osveschenie/galogenovye/78/)
Traceback (most recent call last):
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/parsel/selector.py", line 254, in xpath
    result = xpathev(query, namespaces=nsp,
  File "src/lxml/etree.pyx", line 1599, in lxml.etree._Element.xpath
  File "src/lxml/xpath.pxi", line 300, in lxml.etree.XPathElementEvaluator.__call__
  File "src/lxml/xpath.pxi", line 93, in lxml.etree._XPathContext.registerVariables
  File "src/lxml/extensions.pxi", line 612, in lxml.etree._wrapXPathObject
lxml.etree.XPathResultError: Unknown return type: re.Pattern

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/utils/defer.py", line 132, in iter_errback
    yield next(it)
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/utils/python.py", line 354, in __next__
    return next(self.data)
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/utils/python.py", line 354, in __next__
    return next(self.data)
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/spidermiddlewares/referer.py", line 342, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/spidermiddlewares/urllength.py", line 40, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/scrapy/spiders/crawl.py", line 116, in _parse_response
    for request_or_item in iterate_spider_output(cb_res):
  File "/home/user/projects/amon/amon_long/amon_spider_muxin.py", line 245, in parse_item
    for i, item in iter_goods(response, url):
  File "/home/user/projects/amon/amon_long/spiders/avangard_voda_ru.py", line 53, in _parse_responce
    loader.add_xpath(
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/itemloaders/__init__.py", line 349, in add_xpath
    values = self._get_xpathvalues(xpath, **kw)
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/itemloaders/__init__.py", line 386, in _get_xpathvalues
    return flatten(self.selector.xpath(xpath, **kw).getall() for xpath in xpaths)
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/parsel/utils.py", line 21, in flatten
    return list(iflatten(x))
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/parsel/utils.py", line 27, in iflatten
    for el in x:
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/itemloaders/__init__.py", line 386, in <genexpr>
    return flatten(self.selector.xpath(xpath, **kw).getall() for xpath in xpaths)
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/parsel/selector.py", line 260, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/six.py", line 718, in reraise
    raise value.with_traceback(tb)
  File "/home/user/projects/amon/venv/lib/python3.10/site-packages/parsel/selector.py", line 254, in xpath
    result = xpathev(query, namespaces=nsp,
  File "src/lxml/etree.pyx", line 1599, in lxml.etree._Element.xpath
  File "src/lxml/xpath.pxi", line 300, in lxml.etree.XPathElementEvaluator.__call__
  File "src/lxml/xpath.pxi", line 93, in lxml.etree._XPathContext.registerVariables
  File "src/lxml/extensions.pxi", line 612, in lxml.etree._wrapXPathObject
ValueError: XPath error: Unknown return type: re.Pattern in //tr[starts-with(td[1]/text(), "Цена:")]/td[2]/text()
Gallaecio commented 2 years ago

Thanks for reporting this! Fixed in 1.0.6.