scrapy / parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
BSD 3-Clause "New" or "Revised" License
1.15k stars 146 forks source link

Adding a `strip` kwarg to `get()` and `getall()` #249

Open bblanchon opened 2 years ago

bblanchon commented 2 years ago

Hi,

Thank you very much for this excellent library ❤️

I've been using Parsel for a while and I constantly find myself calling .strip() after .get() or .getall(). I think it would be very helpful if Parsel provided a built-in mechanism for that.

I suggest adding a strip kwarg to get() and getall(). It would be a boolean value, and when it's true, Parsel would call strip() on every match.

Example with get():

# Before
author = selector.css("[itemprop=author] [itemprop=name]::text").get()
if author:
   author = author.strip()

# After
author = selector.css("[itemprop=author] [itemprop=name]::text").get(strip=True)

Example with getall():

# Before
authors = [author.strip() for author in selector.css("[itemprop=author] [itemprop=name]::text").getall()]

# After
authors = selector.css("[itemprop=author] [itemprop=name]::text").getall(strip=True)

Alternatively, we could change the ::text pseudo-element to support an argument, like ::text(strip=1). That would be extremely handy too and probably more flexible than my original suggestion, but also more difficult to implement.

I know I could strip whitespaces with re() and re_first() but it's overkill and hides the intent.

Best regards, Benoit

bblanchon commented 1 year ago

PR #260 and #127 have gone stale. Would one of them ever get merged? I can't imagine I'm the only person calling .strip() on scraped strings.