scrapy / parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
BSD 3-Clause "New" or "Revised" License
1.12k stars 144 forks source link

What do you think about Selector(response).xpath().map() ? #92

Open pawelmhm opened 7 years ago

pawelmhm commented 7 years ago

Sometimes I'd like to apply some function after extracting something, and I do something like this:

In[32]: map(json.loads, sel.xpath("//@data-p13n-asin-metadata").extract())

Out[32]: [{u'asin': u'B00Y2863GQ', u'price': 221.99, u'ref': u'pd_bxgy_75_1_1'},
 {u'asin': u'B008J3UD2U', u'price': 9.22, u'ref': u'pd_bxgy_75_2_2'},
 {u'asin': u'B008J3UD2U', u'ref': u'pd_sim_75_1'}]

what do you think about adding support for map on selector result level? So that I could do

 sel.xpath("//@data-p13n-asin-metadata").map(json.loads)

or even allow to pass list of functions

 sel.xpath("//@data-p13n-asin-metadata").map([json.loads, lambda d: d.get('asin'))

?

Granitosaurus commented 7 years ago

I kinda like it, however it doesn't seem to be that useful, as well as map being unpythonic (list comprehensions are better! :))

sel.xpath("//@data-p13n-asin-metadata").map([json.loads, lambda d: d.get('asin'))
# vs
[json.loads(v).get('asin') for v in sel.xpath("//@data-p13n-asin-metadata").extract()]

As the comparison shows map doesn't really add much. Sure it might look a bit more tidy but list comprehension are more straigh-forward and most importantly more explicit.
Also you can use dict, set and generator comprehensions!

So I feel on the edge - would be nice but seems awfully unnecessary and inferrior to comprehensions.

kmike commented 7 years ago

I'm also on fence; let's say -0 to add such shortcut, as it doesn't add much, and it is not composable.

To make processing code shorter and more composable I'd try to explore something like pytoolz currying instead.

redapple commented 7 years ago

I'm also -0 on this one. I prefer comprehension myself.