Open kmike opened 10 years ago
It lets the return type depend on an argument, TakeFirst(None)(List) would return just an element of the list while TakeFirst(1)(List) an element of the list wrapped in a list. Personally I don't like making the types less predictable but python does this already:
On the one hand TakeFirst(3) looks like natural language but on the other it implies a different return type.
I would rather see a function, like an argument to filter, passed to TakeFirst() allowing for customization of the default "value is not None and value != '' " (I am just saying this here/now because it conflicts with the above proposal)
From my perspective, the loader.processor lacks two loaders.
I 'd go a bit further, since your TakeFirst(-1) made me think of: lambda n: lambda l: filter(None, l)[:n] In the snippet notice that the first argument to filter() could also be customized, (without making the return type depend on the argument to the factory): lambda f=None: lambda l: filter(f, l) (Of course I oversimplify it here with None just to keep the snippet simple) Let us call it "Filter" since "TakeFirst" return a single element.
Now "Slice": Slice = lambda a, b: lambda l: l[a, b] the operator module lacks a slicegetter anyway Now the example with TakeFirst(3) would be written: Compose(Filter(), Slice(None, 3)) and why not name the composition as processor.TakeSlice or TakeFirstN, a shortcut.
Most processors offer practical shortcuts for long compositions of lambdas, "functools" and "operator". I see TakeFirst as a shortcut of shorcuts.
To say this in plain code https://github.com/Digenis/scrapy/commit/1f088d46438a12123b3b32e6a284889ef6588676 and doctests to demonstrate their behaviour https://github.com/Digenis/scrapy/blob/0e9c95dfcccb1628d51bf500eca60b683ab95fb5/scrapy/contrib/loader/processor.py#L52-L100
(just digging out old issues)
If we really want that (disclaimer: I don't use loaders myself), I'd favor a seperate processor TakeFirstN(n)
, always returning a list of at most n elements
@kmike - Is this something you'd still like to see? We might resolve some of @Digenis' concerns with a new processor, TakeFirstN
, which would simplify typing and clarity issues a lot?
TakeFirstN sounds fine to me, though I haven't used item loaders for a long time.
What do you think about adding "n" argument to TakeFirst processor?
TakeFirst(3)
would mean "return a list with at most 3 non-empty values".