Open darrylabbate opened 1 year ago
Also, audit the behavior of
$abc
. Currently,abc
would be treated as an expression (variable). To dereference the field table with a named group, you'd need to use a string literal (e.g.$'abc'
).
This would be a breaking change, but logically it makes sense for $foo
to correspond to the capture group foo
The named capture groups can be extracted from a compiled pattern (pcre2_code *
) via pcre2_pattern_info()
.
PCRE2_INFO_NAMETABLE
returns a pointer to the first entry of the "name table" (PCRE2_SPTR
)PCRE2_INFO_NAMECOUNT
returns the number of named capture groups (uint32_t
)PCRE2_INFO_NAMEENTRYSIZE
returns the size of each entry in the name table (uint32_t
), which is essentially the length of the longest capture group name + 3 (8-bit library)
Example pattern and corresponding name table layout:
(?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) )
00 01 d a t e 00 ??
00 05 d a y 00 ?? ??
00 04 m o n t h 00
00 02 y e a r 00 ??
Obvious approach:
pcre2_substring_copy_byname()
upon pattern matchingShould look closely at the PCRE2 spec for duplicated group names before doing any optimzations with the number <-> name mapping.
Should look closely at the PCRE2 spec for duplicated group names before doing any optimzations with the number <-> name mapping.
In an attempt to reduce confusion, PCRE2 does not allow the same group number to be associated with more than one name. [...] However, there is still scope for confusion. Consider this pattern:
(?|(?<AA>aa)|(bb))
Although the second group number 1 is not explicitly named, the name
AA
is still an alias for any group 1. Whether the pattern matches "aa" or "bb", a reference by name to groupAA
yields the matched string.
(source)
I.e. Number -> name mapping should be safe if needed; even with PCRE2_DUPNAMES
. Name -> number mapping isn't safe since a name can correspond to multiple numbered groups.
Numbered groups are already stored; forgot to implement named capture groups.
Also, audit the behavior of
$abc
. Currently,abc
would be treated as an expression (variable). To dereference the field table with a named group, you'd need to use a string literal (e.g.$'abc'
).If the field table were named/aliased (like
arg
), you could cleanly dereference usingmatch.group
ormatch[n]
.