Open annettejanewilson opened 1 year ago
It might be helpful to begin by pointing out that all three of the major implementations of jq exhibit the same behavior, even though they use different RE engines. Some other regex processors behave in the same way too, e.g. awk:
$ jq -nM '"\nx"|[match("^x")]' []
$ gojq -nM '"\nx"|[match("^x")]' []
$ jaq -n '"\nx"|[match("^x" )]' []
$ awk 'BEGIN { if ("\nx" ~ /^x/) {print "match"}}' $
Furthermore, the three implementations of jq support the (?m)
convention
for achieving the alternative interpretation of "^":
$ jq -nM '"\nx"|test("(?m)^x")' true
$ gojq -nM '"\nx"|test("(?m)^x")' true
$ jaq -n '"\nx"|test("(?m)^x")' true
To understand what's going on here in the case of the C implementation of jq, the Oniguruma manual can be consulted. Since jq uses the Perl_NG flavor of regex, the ONIG_SYNTAX_PERL section of Appendix A-1 applies (see https://github.com/kkos/oniguruma/blob/master/doc/RE):
(?s): dot (.) also matches newline
(?m): ^ matches after newline, $ matches before newline
That is, these modifiers are required to achieve the alternative behavior.
Postscript: All 8 of the regex engines available at https://regex101.com/ require the "m" option in accordance with the above.
@itchyny - I think it's safe both to remove the bug
label, and to close the issue.
Thanks for investigation.
At the very least, this is a documentation issue. The "s" flag doesn't do anything! It's misleading to document it without noting that it's useless. It's also quite a tripping hazard that the "s" and "m" flags (as passed as the second argument to regex functions) have swapped meanings from the options passed inside the regex:
^
matches only at start of string, .
does not match newlines
s
flag
m
flag
.
to match newlines
(?s)
option
.
to match newlines
(?m)
option
^
to match after newlines
I think the source of the confusion is that Oniguruma uses the terms differently from Perl and the other regex engines I'm familiar with. We're telling Oniguruma to interpret the regex (including options in the regex) using Perl's meanings (single-line-mode means dot-matches-all, multi-line-mode means anchors-match-at-newlines), but we're using Oniguruma's interpretations (multi-line-mode is dot-matches-all and single-line-mode is anchors-DON'T-match-at-newlines) in the flags and documentation. I think it's a bit of a compatibility/consistency nightmare. Ideally only one interpretation would be exposed to the user, but if you want to preserve compatibility it's a bit late for that.
@annettejanewilson - I think the problem is just that the jq documentation is wrong, apparently because of a failure to distinguish properly between the single-letter options allowed in "extended groups" and the single letters allowed in FLAGS. Anyway, I'm working on it. Thanks for your help.
Describe the bug jq regular expressions are always handled as if "single line" mode is enabled. The "single line" flag has no effect.
To Reproduce
Expected output:
Actual output:
Expected behavior If the "s" flag and the "p" flag are not passed, then the ^ should match at the start of all lines, not just the first, and $ should match at the end of all lines, not just the last.
Environment (please complete the following information):
Additional context This appears to be because Oniguruma in PERL_NG syntax defaults to single line mode and must be passed a flag to negate it.
I will provide a PR.