scrapy / itemloaders

Library to populate items using XPath and CSS with a convenient API
BSD 3-Clause "New" or "Revised" License
45 stars 16 forks source link

Response and other context is not passed to nested loader #34

Open migr1 opened 6 years ago

migr1 commented 6 years ago

When passing the response to an item loader:

loader = ItemLoader(item=Product(), response=response)

The response can be used in an input processor via the loader_context param:

def make_absolute_url(url, loader_context):
    return loader_context['response'].urljoin(url)

However, when using a nested loader:

loader = ItemLoader(item=Product(), response=response)
nested_loader = loader.nested_xpath('...')

The input processor fails with the exception:

  File "C:\migr\crawlers\crawlers\items.py", line 16, in make_absolute_url
    return loader_context['response'].urljoin(url)
AttributeError: 'NoneType' object has no attribute 'urljoin'

response is not passed to the input processor and I believe the reason is https://github.com/scrapy/scrapy/blob/acd2b8d43b5ebec7ffd364b6f335427041a0b98d/scrapy/loader/__init__.py#L55 where a new context is created without reusing any of the current context.

To me, this is unexpected behavior. Since the loader is nested, I presumed that all context except the selector would be preserved unless explicitly overwritten in the call to nested_xpath. But maybe there is an explanation for not passing response and/or other context on to the nested loader. I can file a pull request if you agree that the behavior should be changed.