riff-lang / riff

The Riff programming language
https://riff.cx
BSD Zero Clause License
23 stars 1 forks source link

Store named capture groups in field table #43

Open darrylabbate opened 1 year ago

darrylabbate commented 1 year ago

Numbered groups are already stored; forgot to implement named capture groups.

Also, audit the behavior of $abc. Currently, abc would be treated as an expression (variable). To dereference the field table with a named group, you'd need to use a string literal (e.g. $'abc').

If the field table were named/aliased (like arg), you could cleanly dereference using match.group or match[n].

darrylabbate commented 1 year ago

Also, audit the behavior of $abc. Currently, abc would be treated as an expression (variable). To dereference the field table with a named group, you'd need to use a string literal (e.g. $'abc').

This would be a breaking change, but logically it makes sense for $foo to correspond to the capture group foo

darrylabbate commented 8 months ago

The named capture groups can be extracted from a compiled pattern (pcre2_code *) via pcre2_pattern_info().

Example pattern and corresponding name table layout:

  (?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) )
  00 01 d  a  t  e  00 ??
  00 05 d  a  y  00 ?? ??
  00 04 m  o  n  t  h  00
  00 02 y  e  a  r  00 ??

Obvious approach:

Should look closely at the PCRE2 spec for duplicated group names before doing any optimzations with the number <-> name mapping.

darrylabbate commented 3 weeks ago

Should look closely at the PCRE2 spec for duplicated group names before doing any optimzations with the number <-> name mapping.


In an attempt to reduce confusion, PCRE2 does not allow the same group number to be associated with more than one name. [...] However, there is still scope for confusion. Consider this pattern:

(?|(?<AA>aa)|(bb))

Although the second group number 1 is not explicitly named, the name AA is still an alias for any group 1. Whether the pattern matches "aa" or "bb", a reference by name to group AA yields the matched string.

(source)


I.e. Number -> name mapping should be safe if needed; even with PCRE2_DUPNAMES. Name -> number mapping isn't safe since a name can correspond to multiple numbered groups.