mllg / checkmate

Fast and versatile argument checks
https://mllg.github.io/checkmate/
Other
261 stars 30 forks source link

Enabling PCRE RegEx patterns #262

Open Michiel91 opened 4 months ago

Michiel91 commented 4 months ago

I often use checkmate to verify character vectors based on certain criteria, combined with a RegEx pattern match. One limitation frequently popping up in the current version is that not all RegEx patterns are supported, for example lookbehind expressions. This is due to checkmate running grepl without perl = TRUE in checkCharacterPattern. With perl = FALSE the patterns are handled by the TRE engine, while handled by the PCRE engine for perl = TRUE.

It would be great if users could indicate whether or not to use "perl" (or TRE vs PCRE) in all checkmate functions with a pattern argument. A passthrough of this argument to the underlying grepl code would then enable the usage of much more (and often more advanced) RegEx patterns.

Below is an example to reproduce the behavior of checkmate and grepl I am referring to:

# Define example values and RegEx example_values <- c("value1", "some_string", "Empty", "482733") example_regex <- "^(?!Empty$).+$"

# Check with checkmate: fails due to "Invalid regexp" checkmate::test_character(x = example_values, pattern = example_regex)

# Check with grepl perl = FALSE: fails due to "Invalid regexp" base::grepl(x = example_values, pattern = example_regex)

# Check with grepl perl = TRUE: works! base::grepl(x = example_values, pattern = example_regex, perl = TRUE)

Many thanks in advance for taking this proposal into consideration! Let me know if I can help by providing additional input.