Open lu-zero opened 1 year ago
Yeah, compatibility is generally why we use onig
instead of, say, the standard regex
crate. I think we could probably make it optional as the additional features aren't used that much (in coreutils as well). It'd be great if there was some regex api that allowed us to switch with just a feature flag and wouldn't require us to write both versions everywhere.
ripgrep abstracted quite a bit so it can use pcre2 or regex
.
https://gitlab.redox-os.org/redox-os/posix-regex might be a good option
I was convinced that regex
supports a superset of the posix one. Probably would be good to make a table of what is supported and what is not. And do the same for the glob
crates...
regex
is not a superset of either Posix BREs or EREs, since they both support back-references ((f)ire\1ox
) while regex
does not. I agree a comparison table would be nice.
Our glob implementation converts globs to Posix BREs, so any POSIX-compatible regex implementation gets us globs for free: https://github.com/uutils/findutils/blob/main/src/find/matchers/glob.rs
onig is still not updated, and clang-16 is going to hit more distributions. Given that upstream seems unresponsive should we start looking for alternatives more actively?
Am I missing some part of the conversation? What's going on with clang-16?
clang-16 makes onig non-buildable, I sent a patch to fix it more or less as I opened this issue.
Oh that's unfortunate. I wonder if https://crates.io/crates/fancy-regex be a good alternative?
That explains the error I'm getting when building this using MSYS2 / UCRT64, as that project recently updated to Clang 16.
There are a number of pure rust globs and regex crates, might be better to not have to deal with a C dependency if we could avoid it.
From what I'm seeing
regex
may lack a mean to select a specific flavour of regex, not sure if somebody already had a mean to restrict the engine to not support extensions compared to posix/emacs.