Open hirooih opened 3 years ago
Sorry to be delayed.
First I don't understand what non-matching bracket expressions means.
I guess this wrote about [^...]
.
I think it is more portable to use ^ or $ than using \n because there are variations of line-break characters.
Are you talking about CR and LF? The buffer used in regex matching is filled by functions defined in main/read.c. The functions normalize the line-break characters to '\n'. So we can use it '\n'.
We can also say we have to use \n because that regex is not compiled with REG_NEWLINE.
YES.
If I understand correctly, it is better to set REG_NEWLINE for --_mtable-regex-
, too.
I'm not sure which one, setting or not setting, is better. Anyway, too many parsers in optlib/ assume that REG_NEWLINE is not set.
@masatake san,
I've found <newline>
, <period>
, and so on are not displayed in my original post. I've fixed them.
I guess this wrote about
[^...]
.
I see.
Are you talking about CR and LF?
Yes.
The buffer used in regex matching is filled by functions defined in main/read.c. The functions normalize the line-break characters to '\n'. So we can use it '\n'.
Good news. It will be better to be documented. I will take this.
During studying Perl regular expressions for #3036, I found I did not understand the treatment of newline correctly. I did not distinguish /s
modifier and /m
modifier.
And I need some more time to remember this issue:-) Give me some tme.
@masatake san, I remembered the points of this issue.
First I withdraw the followings.
If I understand correctly, it is better to set REG_NEWLINE for --_mtable-regex-
, too.
The point is the statement is wrong.
What that means is using a regex pattern with [^\n]+ is invalid,
As I cited;
“the use of literal
s or any escape sequence equivalent produces undefined results”.
This is for "the individual descriptions of those standard utilities" and under condition "if not stated otherwise".
A regex pattern with "non-matching bracket expressions", [^\n]+
, is valid in general.
But you see it in glibc produces very odd results. In that case we leave the notice not to use it. I am curious how it works oddly. Is there a test case for it?
If we agree, let me send a PR for above.
BTW you already merged #3036, the Read the Docs has not updated yet.
During working on PR #3109 I found description of the treatments of newlines might be wrong. But I might be wrong. Let me know what I am missing.
From Regular expression (regex) engine:
The description of the specification including before and after the quoted sentence is as follows.
It does not say "What that means is using a regex pattern with [^\n]+ is invalid". I can find a description of special treatment of in the spec.
Does this describe about an issue specific to the implementation of glibc?
And the the next sentence follows;
In the Universal Ctags case this is similar to should be eliminated.
--regex-<LANG>
what processes input line by line.--regex-<LANG>
does not have to care setting ofREG_NEWLINE
, if I understand correctly.This is OK. But I don't understand the following senence;
First I don't understand what
non-matching bracket expressions
means. Of course brackets ([
and]
) should be paired. But I guess the sentence above means different things.I think it is more portable to use
^
or$
than using\n
because there are variations of line-break characters.We can also say we have to use \n because that regex is not compiled with REG_NEWLINE. If I understand correctly, it is better to set
REG_NEWLINE
for--_mtable-regex-<LANG>
, too.