scrapy / itemloaders

Library to populate items using XPath and CSS with a convenient API
BSD 3-Clause "New" or "Revised" License
45 stars 16 forks source link

Add n argument to TakeFirst processor #38

Open kmike opened 10 years ago

kmike commented 10 years ago

What do you think about adding "n" argument to TakeFirst processor?

Digenis commented 10 years ago

It lets the return type depend on an argument, TakeFirst(None)(List) would return just an element of the list while TakeFirst(1)(List) an element of the list wrapped in a list. Personally I don't like making the types less predictable but python does this already:

On the one hand TakeFirst(3) looks like natural language but on the other it implies a different return type.

I would rather see a function, like an argument to filter, passed to TakeFirst() allowing for customization of the default "value is not None and value != '' " (I am just saying this here/now because it conflicts with the above proposal)

Digenis commented 10 years ago

From my perspective, the loader.processor lacks two loaders.

I 'd go a bit further, since your TakeFirst(-1) made me think of: lambda n: lambda l: filter(None, l)[:n] In the snippet notice that the first argument to filter() could also be customized, (without making the return type depend on the argument to the factory): lambda f=None: lambda l: filter(f, l) (Of course I oversimplify it here with None just to keep the snippet simple) Let us call it "Filter" since "TakeFirst" return a single element.

Now "Slice": Slice = lambda a, b: lambda l: l[a, b] the operator module lacks a slicegetter anyway Now the example with TakeFirst(3) would be written: Compose(Filter(), Slice(None, 3)) and why not name the composition as processor.TakeSlice or TakeFirstN, a shortcut.

Most processors offer practical shortcuts for long compositions of lambdas, "functools" and "operator". I see TakeFirst as a shortcut of shorcuts.

To say this in plain code https://github.com/Digenis/scrapy/commit/1f088d46438a12123b3b32e6a284889ef6588676 and doctests to demonstrate their behaviour https://github.com/Digenis/scrapy/blob/0e9c95dfcccb1628d51bf500eca60b683ab95fb5/scrapy/contrib/loader/processor.py#L52-L100

redapple commented 8 years ago

(just digging out old issues) If we really want that (disclaimer: I don't use loaders myself), I'd favor a seperate processor TakeFirstN(n), always returning a list of at most n elements

ghost commented 6 years ago

@kmike - Is this something you'd still like to see? We might resolve some of @Digenis' concerns with a new processor, TakeFirstN, which would simplify typing and clarity issues a lot?

kmike commented 6 years ago

TakeFirstN sounds fine to me, though I haven't used item loaders for a long time.