Closed mauke closed 1 month ago
I think it would be worth keeping parity with URI::Escape, which allows character classes like [:alpha:]
to be used. That does not currently work, but it did in an earlier form of the code.
I fixed some similar issues in URI::Escape in libwww-perl/URI#112. It also eliminated the unneeded subref and string eval. I think I'd prefer to also fix that here as long as we're making changes. But the tests in this PR demonstrate a mistake in that PR. It doesn't properly account for all double \\
sequences.
I've implemented support for POSIX-style character classes now, along with additional tests.
(The URI::Escape code doesn't seem to handle negated classes like [[:^alpha:]]
.)
It would be good to also test escaping character classes like \w
. Otherwise, lgtm.
@haarg this just needs additional tests?
I added some tests after merging.
Thanks, @mauke and @haarg!
Version 3.83 just released.
encode_entities() generates and evals custom code at runtime (as a performance optimization). However, its code generation was too naive and certain characters could be used to break out of the character class regex:
$
(dollar sign) would trigger perl's variable interpolation]
and/
(the character class and regex delimiters, respectively)The latter two were usually escaped, but not if they were preceded by
\
in the input, even if that\
was itself escaped by another\
(NB: this is why it is generally a mistake to handle escaping logic with look-behinds: you need arbitrary-width look-behind to figure out whether the current chain of backslashes is even or odd in length in order to know what is being escaped or not).The latter issue was fixed by doing a single pass over the input string with no look-behind; the former by switching the delimiter to
'
(which inhibits interpolation).Fixes #44.