Closed HolyBlackCat closed 3 years ago
Temporary, it's working with the following rule by excluding "
in the group name:
{
"beginRegex": "\"([^\\(\"]{0,16})\\(",
"endRegex": "\\)\\1\"",
"foldLastLine": false,
}
I need to find a way so to fix the core of the issue. (without the need of "
)
Not sure how to fix the issue.
The regex is build by replacing the \1
reference in endRegex
with its source in beginRegex
. It's building the regex (?<_0_0>"([^\(]{0,16})\()|(?<_2_0>\)[^\(]{0,16}")
.
The issue is that [^\(]
isn't excluding "
so )" "(
will be matched as )" "
.
The best would be to update the documentation so others users are aware of this limitation. What do you think? Do you have another solution?
@daiyam I could definitely live with this workaround, but I agree that if you don't fix it, it should be documented.
I think this has worked in previous versions though, so there was some solution to this?
C++ in its "raw string" syntax (that I tried to explain in #14) does allow using "
s in this place. While it's unlikely to happen in real code, this means that we're one step back from being able to perfectly fold C++.
I do have an idea for a solution.
I didn't look at your source (because I don't know Typescript), but from what you said, it seems that you're using one large regex in every context. But what if you generated the regex on the fly, depending on context?
E.g. when you start parsing a document, you compose a regex from all 'begin' regexes from the config: (?<_0>BEGIN0)|(?<_1>BEGIN1)|(?<_2>BEGIN2)
.
Then, when you get a match, you check if the range has "nested":false
or not.
(?<_e0>ENDi)|(?<_0>BEGIN0)|(?<_1>BEGIN1)|(?<_2>BEGIN2)
. Again, if 'begin' had capture groups, the added 'end' regex should have it verbatim.
When you go deeper into folding ranges, you only need to keep the single last 'end' prepended. So the regex doesn't grow with the amount of folding ranges.I will go for the documentation.
I've checked, the way how the capture groups are working hasn't been changed since they have been introduced. Maybe it's because you just found some edge cases???
Ok. Yeah, I might be wrong on this being a regression.
I guess I'll have to learn TS sooner or later if I want a perfect C++ folding. :P
Any insights on why it's not implemented like I described? (I mean copying the text matching capture group into the end regex.) That would look more straightforward to me, but since I didn't look into the code, I might be missing why doing it the other way is necessary.
Performance wise, in javascript, a big regex is better than several small regexes. Also a regex is costly to build. So to dynamically build regexes when parsing big files, it can be bad for the performance.
The document is parsed line by line to keep track of the line number (used by the folding range). Then the line is parsed by a regex which is like BEGIN1|MIDDLE1|END1|SEPARATOR2|WHILE3|DOCSTRING4|BEGIN5
so it can match for nested folding regions.
When "nested": false
, only BEGIN
is present in the regex. And when matched, it will look directly for the END
since nothing can be nested.
I've updated the doc https://github.com/zokugun/vscode-explicit-folding/blob/master/docs/rules/begin-end.md. What do you think?
Got it, thanks. The updated doc looks ok to me.
Could be interesting to see how VSC handles it in TM grammars. They have this exact same feature, but without the limitation.
From https://github.com/microsoft/vscode/blob/main/extensions/cpp/syntaxes/cpp.tmLanguage.json#L15418, here the syntax they are using for raw strings:
{
"begin": "((?:u|u8|U|L)?R)\"(?:([^ ()\\\\\\t]{0,16})|([^ ()\\\\\\t]*))\\(",
"beginCaptures": {
"0": {
"name": "punctuation.definition.string.begin"
},
"1": {
"name": "meta.encoding"
},
"3": {
"name": "invalid.illegal.delimiter-too-long"
}
},
"end": "\\)\\2(\\3)\"",
"endCaptures": {
"0": {
"name": "punctuation.definition.string.end"
},
"1": {
"name": "invalid.illegal.delimiter-too-long"
}
},
"name": "string.quoted.double.raw"
}
Excluding space would also fix your issue.
@daiyam
Excluding space would also fix your issue.
Not really, because this code is valid:
R"(
)",1,R"(
)"
The TM grammar parses it correctly somehow, so I'm thinking they handle the capture groups the "honest" way, by substituting the matching text into the end regex.
@HolyBlackCat You were right. It might have been a regression since the fix for ^
. It's fixed.
Also, since v0.18.0
, the endRegex
are generated on the run, like you wanted ;)
@daiyam Thank you!
Sorry to bother you again, but I found more regressions. :P
Describe the issue
Folding ranges are determined incorrectly, when capture groups are present in
beginRegex
and are referenced inendRegex
.To reproduce
cpp
Code Example
Settings
Expected behavior
In both examples, I expected to get two folding ranges: lines 1..3 and 3..5.
Instead, in the first example there's one large range, with lines 1..5. And the second example doesn't fold at all.