ngs-lang / ngs

Next Generation Shell (NGS)
https://ngs-lang.org/
GNU General Public License v3.0
1.4k stars 43 forks source link

[FR] PCRE callouts #551

Closed rdje closed 2 years ago

rdje commented 2 years ago

NGS has support for regexes via PCRE.

But it currently lacks support for a powerful feature named callouts.

This feature enables the PCRE engine to execute arbitrary code defined in the caller's scope using the following syntax

(?Carg)

where arg may be a number[^1] less than 256 or a string[^2].

That arg would be passed as argument to the attached code if/when it is called for a particular instance of (?C).

The callout feature was inspired by Perl's code capsule (?{ code }).

The way I used to play with this Perl feature is the following.

Let's assume I have N regexes

RE1, RE2, .., REN

and that I want to build an alternative regex based on these N REs, but I also want to be able to know exactly which alternative matched, if any.

I would write[^3] something like

/RE1 (?{ $id = 1}) | RE2 (?{ $id = 2}) | ... | REN (?{ $id = N})/

After attempting a match I would simply check for $id to know exactly which one of the REn matched, provided one of them actually matched[^4].

One remark though, Philip Hazel the creator of PCRE recommends switching to PCRE2 (new or revised API) or to use it when starting a brand new project.

Here are various links with additional information about PCRE callout feature

What do you think ? [^1]: For both versions of PCRE. The one NGS currenly uses corresponds to the original API (man pcre 3) and the new one PCRE2 [^2]: Only supported by PCRE2 [^3]: I didn't write these composed REs manually, I used to create abstrations to automate this sort of things for me [^4]: $id == 0 would indicate that none matched

ilyash-b commented 2 years ago

I don't have a strong opinion either way at this point. It seems like it's pretty rare use case though. How much worse would it be to try to match the subject one by one with the alternatives?

rdje commented 2 years ago

It is about speed actually.

It is better to let the regex engine cooperate with the user then leave the user on his/her own trying to figure out which branch matched, at least for my particular use case.

But I am sure they are other eople with their own use case.

But You're probably right though, it is probably not NGS's goal to support such a feature, and I can understand that.

So if you think it would too much of an effect or doesn't align with what you think NGS's features should be, then just close this FR.

ilyash-b commented 2 years ago

I perceive this as low priority. Let's reopen when/if there is more demand. I would gladly accept a contribution that implements this properly though.