tidyverse / stringr

A fresh approach to string manipulation in R
https://stringr.tidyverse.org
Other
603 stars 187 forks source link

New features to easily capture text before or after n instance of delimiters without regex #525

Closed jhtrico1850 closed 11 months ago

jhtrico1850 commented 1 year ago

Microsoft just released the TEXTAFTER and TEXTBEFORE to easily extract text before or after the Nth space, colon, etc. It's similar to what Power Query had for a while with Text.BeforeDelimiter, but exposed in the main Excel formula interface rather than buried within Power Query.

Of course it's possible today to build regex with the existing stringr functions. Something that I deal with often is like having to import PDFs, having to parse and extract the relevant portions. It's quite tedious and error prone with regex to get what I want (like say the 3rd pair within 5 pairs of numbers like 10 3 4 5 4). With the old Power Query formula, and now the regular Excel formula, I can easily describe exactly what I want to get (just get the text before/after after N of the specified pattern. Hope this comes to stringr, or let me know if I'm missing something already that's similar to textafter/textbefore or Text.BeforeDelimiter/Text.AfterDelimiter.

hadley commented 11 months ago

Unfortunately that's out of scope for stringr, because we use stringi, which in turn uses the ICU regular expression engine, which doesn't support this feature.