scrapy / itemloaders

Library to populate items using XPath and CSS with a convenient API
BSD 3-Clause "New" or "Revised" License
45 stars 16 forks source link

Add fallback selectors to ItemLoader #30

Open ejulio opened 5 years ago

ejulio commented 5 years ago

In some cases it is common to have fallback selectors for certain fields. This way, we end up writing a piece of code like

loader = MyLoader(response=response)
loader.add_css('my_field', 'selector1')
loader.add_css('my_field', 'selector2') # fallback 1
loader.add_css('my_field', 'selector3') # fallback 2

However, a, maybe, better way would be

loader = MyLoader(response=response)
loader.add_css('my_field', [
    'selector1',
    'selector2', # fallback 1
    'selector3', # fallback 2
])

The API above would be the equivalent of the first example. However, @cathalgarvey also shared a nice idea to stop in the first matching selector.

loader = MyLoader(response=response)
loader.add_css('my_field', [
    'selector1',
    'selector2', # fallback 1
    'selector3', # fallback 2
], selectors_as_preferences=True)

Then, if selector1 yields a result, the other ones are not attempted, otherwise we fallback to selector2 and so on.

The same API should be applied to loader.add_xpath.

BurnzZ commented 5 years ago

Hi @ejulio @Gallaecio! I'd like to know your thoughts on scrapy/scrapy#3795 for discussion as it's closely related to this. :)

stav commented 5 years ago

Then, if selector1 yields a result, the other ones are attempted,...

Then, if selector1 yields a result, the other ones are NOT attempted,...