Open WORMSS opened 5 years ago
Yeah, currently multiline mode is only turned on when a literal newline is in the regex, for perf.
I think the inability to force multiline string or not accounting for for all the possibilities for needing it turned on should be reviewed.
Newline is indeed not a literal bracket, so should be matched. And multiline needs to be on for that. Even if it's just showing an extra button on the right, nest to the regex search" when someone does a not match [^
only.
You could even just silently bolt on \n{0}
at the end internally after the input value is read. You wouldn't need to do any additional search code alterations. All edits would be at the UI component level, and multiline can be turned on by the user only if needed to keep perf by default.
Though, searching for only Not Matching doesn't help those who want .
to match newline also.
The main reason we haven't done this is that we want to keep the number of possible switches under control, but it's a fair request.
Wanted to create a new issue that \s
does not match newlines, but found this.
@roblourens Yeah, currently multiline mode is only turned on when a literal newline is in the regex, for perf.
How about better newline-regex detection? Something like this:
const { assert } = require("chai");
// Just came up with this, thus requires more tests. Doesn't match some edge cases.
// Requires a variable-length negative-lookbehind support, which is supported by latest ES
const IS_MULTILINE = /(?<!(?<!\\)\[[^\]]*)(?<!\\)\\[ns]|(?<!\\)\[\^(.(?<!(?<!\\)\\[ns]))*(?<!\\)\]|(?<!\\)\[(?!\^)[^\]]*(?<!\\)\\[ns]/;
assert.notOk(IS_MULTILINE.test(/[^\n]/));
assert.notOk(IS_MULTILINE.test(/[^\s]/));
assert.notOk(IS_MULTILINE.test(/\[^a]/));
assert.notOk(IS_MULTILINE.test(/[\\s]/));
assert.ok(IS_MULTILINE.test(/[^\\s]/));
assert.ok(IS_MULTILINE.test(/[^\\n]/));
assert.ok(IS_MULTILINE.test(/[^s]/));
assert.ok(IS_MULTILINE.test(/[\s]/));
assert.ok(IS_MULTILINE.test(/\[\s/));
assert.ok(IS_MULTILINE.test(/[^\s]\s\]/));
assert.ok(IS_MULTILINE.test(/\[\s\]/));
I am assuming in this test below, s is just a substitute of any writeable char
assert.ok(IS_MULTILINE.test(/[^s]/));
@WORMSS in this case it's anything other than literal s
, which for this purpose seems an irrelevant test case which I must have mistakenly kept in the test suite when I was fuzzing all manner of cases.
Regex Buddy confirms.
Though, I still had to do my hack to get it to match newline as a replaceable char. (ignoring the fact its treating \n\r as a single char.
Windows obviously.
I'm probably missing the point. /[^s]/
should match newline, hence:
assert.ok(IS_MULTILINE.test(/[^s]/));
is passing.
`s\ns`.match(/[^s]/)
// ["↵", index: 1, input: "s↵s", groups: undefined]
Sorry, I read the sentence back and it makes little sense to myself.
Regex Buddy agrees
would have been a better way for me to say it.
But I wanted to point out that RegexBuddy seems to count \n
as a "not an s" even with multiline turned off. I don't know if that makes much of a difference.
And strangeness that \r\n is counted as a single char in vsc. (which I actually prefer to be honest)
\n
is "not an s
". Am I still missing the point? What are you suggesting? Where do you see a problem in the IS_MULTILINE
regex behavior?
I think there might be a bigger issue with "multiline" term in the first place if we are trying to agree with each other but seem to be missing the mark.
I've just re-re looked up my understanding multiline is for ^
and $
usage.
It seems javascript agrees with me, that "not an s" should match new line by default and not need multilined turned on.
s
flag would be .
matches newline.
I am guessing IS_MULTILINE is just a completely different type of allow multiline that is not directly related to a regex multiline?
I see --- yes, we're using the term to mean different things. When I say multiline in this context, I'm talking about ripgrep --multiline
switch, which is what vscode uses to enable a regexp to match across multiple lines. From ripgrep's manual:
-U, --multiline
Enable matching across multiple lines.
When multiline mode is enabled, ripgrep will lift the restriction that a match
cannot include a line terminator. For example, when multiline mode is not
enabled (the default), then the regex '\p{any}' will match any Unicode
codepoint other than '\n'. Similarly, the regex '\n' is explicitly forbidden,
and if you try to use it, ripgrep will return an error. However, when multiline
mode is enabled, '\p{any}' will match any Unicode codepoint, including '\n',
and regexes like '\n' are permitted.
An important caveat is that multiline mode does not change the match semantics
of '.'. Namely, in most regex matchers, a '.' will by default match any
character other than '\n', and this is true in ripgrep as well. In order to
make '.' match '\n', you must enable the "dot all" flag inside the regex.
For example, both '(?s).' and '(?s:.)' have the same semantics, where '.' will
match any character, including '\n'. Alternatively, the '--multiline-dotall'
flag may be passed to make the "dot all" behavior the default. This flag only
applies when multiline search is enabled.
Then 👍 Big thumbs up that [^s]
should indeed turn that mode on.. Ok.. I am up to date and agreeing. Good job everyone.. take a 5 minute break and enjoy your evening.
It seems like you are just saying that "if a regex can match \n then the multiline flag should be turned on". Isn't this the same as having the multiline flag turned on all the time?
Anyway to me the whole point is that the majority of regexes are not intended to match multiple lines, it's quite common to include \s
when you really are looking for spaces and tabs within a line. And including \s
should not blow up your search results by matching a bunch of \r
and \n
when you did not explicitly ask for it.
This heuristic of using \n is my alternative to actually having a button or checkbox to enable multiline mode.
I explicitly asked it to match any char that was not a ]
To the best of my knowledge \n
and \r
are not ]
so therefore should have been matched.
There is no visible indication anywhere that multiline is turned on or off. It is just some magic that goes on behind the scenes. Only way to get consistent results would be to add \n{0}
to every regex I do.
You may feel that \s
should match only spaces
and tabs
but can you guarantee that every single user of VSCode would (1) know that that is indeed what is going on, and (2) know how to bypass the limitation if they needed to?
All I can do is what I think is right for the majority of users and hope that the rest can google their way to a solution. Because of the perf impact and possibility for unexpected behavior I think multiline search should always be an explicit opt in. My guess is that we would add a button for this some day but it will take careful UX thinking because we have too many options crammed into a small space already.
This heuristic of using \n is my alternative to actually having a button or checkbox to enable multiline mode.
IMO it's much better to have slower search than to potentially cause bugs (among other things), e.g. when someone refactors code and the search doesn't match all it should, in which case you better hope your CI catches it.
Nothing worse than a regex engine that doesn't comply to spec in such a glaring way as redefining \s
to exclude \n
, which is pretty unintuitive. I'd at least show a warning message when the regex contains \s
.
Also, having an option to opt-in for \s
to include \n
seems reasonable to me.
That doesn't help in my case when I am not searching for \s
but searching for not ]
That doesn't help in my case when I am not searching for \s but searching for not ]
True. I'd say your case is even more insidious, because it contains no whitespace characters in any way.
More generally, how should a user know that in vscode alone, you need to write (?:[^\]]|\n)
instead of [^\]]
?
Problem is, the above solution doesn't generalize because apparently you can't use (?:a|\n)
in negative-lookbehind because the vscode regex parser thinks it's variable length (which it isn't -- works in rg
).
We should definitely tell user in some way that vscode search redefines how newline matching works.
Just ran into the same issue with the following use case: I have a large JSON file in the following format:
[
{
"key": []
},
{
"key": []
},
...
{
"key": [
1
]
},
...
{
"key": []
},
{
"key": []
},
]
I wanted to find the key
which contained a non-empty array. Naturally I tried to search for "key": \[[^\]]
in the file but was rewarded with 0 matches. Luckily I thought of searching for "key":
(X
matches) and then for "key": \[\]
(X-1
matches) and figured out there's newline shenanigans. Took me the good part of an hour though. Would definitely like to see an option to enable multiline.
Hey mates @WORMSS Why not using structural search for your case?
I've created an extension that make it super easy to search such cases without having to fight with regexes!
Check CodeQue
The query would as simple as that:
Promise.all([
await $$$(), // $$$ is any statement wildcard
]);
I'm using next.js repo as test case here
@Mitko-Kerezov in your case, the query could bee like this
({
"key": [$$$]
})
You can give it a try in CodeQue Playground
I just use \n{0}
a the end of all my searches.. been working fine so far..
How VSCode developers dared to violate the common standards among regex flavours? Any regex engine (except for the old shell command sed
) finds newline when I search for whitespace \s
or negative character classes which don't include a newline. This is the definition.
negated character class reference not matching newline in negated character classes is not a comparison criterion among important flavours
My problem, I want to search for multiline comments. Any javascript engine would match multiple lines with this regex:
/\/\*[^*]*\*\//g
Only in VSCode, I need this:
/\*(?:[^*]|\n)*\*/
(The \n{0}
trick does not work for me.)
It is less convenient to type.
In this case, /\*[.\n]*?\*/
would be even better (and works for VSCode and JavaScript) but it has different semantics in the general case.
It's a common advice to use [\s\S]
for matching all characters but it doesn't work with VSCode.
VSCode's regex interpretation is unintuitive regarding newlines because no other relevant regex engine behaves this way. Most languages consist of multi-line structures where users will need to search for multi-line structures at some time (when matching code pieces).
Usability is always more important than program speed. Because usability is the bottle neck. In the opposite case, the work that the code would do is imposed on the user who is slower than the code. Sometimes, VSCode cannot even find multiline matches in the way I described above. (I found a case that led to catastrophic backtracking in VSCode.) Then I copy the text to another editor or to the browser to find/replace with their regex engine.
If VSCode cannot do it "because of the perf impact" then it is not the newline to blame but the implementation of VSCode. I never had any speed problems with other regex engines which allow me to match many lines at once. The actual performance impact is catastrophic backtracking which is easy to bypass with a good regex style.
Issue Type: Bug
I was trying to search for
but I had no way to turn on multiline support so that
^\]
would match newlines too. So searching for below would not be found.I later put in the below and was happy to find that it suddely worked, but knew that it would only find if await was on the first line after the bracket, and also if there was no chars other than newline space or tab.
But, I then started to find others that actually SHOULDN'T have worked.. It was at this point, I realsed that
^\]
was working as expected the first time, and was matching on newline chars.. (Perfect, I removed my tab/space hack).. To find, it stopped working again..It seems I need atleast 1
\n
in the regex to make it work.My quick fix was to add this at the very end as such
But this is obviously a dirty hack and not an obvious one, be far better to test for
Not Matching
and turn multiline back on, similar to what ever test you are doing for searching for\n
and turning on multilineVS Code version: Code 1.35.0 (553cfb2c2205db5f15f3ee8395bbd5cf066d357d, 2019-06-04T01:17:12.481Z) OS version: Windows_NT x64 10.0.10240
System Info
|Item|Value| |---|---| |CPUs|Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz (8 x 3492)| |GPU Status|2d_canvas: enabledchecker_imaging: disabled_off
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
native_gpu_memory_buffers: disabled_software
rasterization: enabled
surface_synchronization: enabled_on
video_decode: enabled
webgl: enabled
webgl2: enabled| |Load (avg)|undefined| |Memory (System)|15.92GB (5.33GB free)| |Process Argv|| |Screen Reader|no| |VM|0%|
Extensions (19)
Extension|Author (truncated)|Version ---|---|--- markdown-preview-github-styles|bie|0.1.6 npm-intellisense|chr|1.3.0 ssh|chr|0.0.4 vue-peek|dar|1.0.2 vscode-eslint|dba|1.9.0 vscode-ts-auto-return-type|ebr|1.0.1 tslint|eg2|1.0.43 RunOnSave|eme|0.0.18 prettier-vscode|esb|1.9.0 todo-tree|Gru|0.0.134 node-module-intellisense|lei|1.5.0 camelcasenavigation|map|1.0.1 rainbow-csv|mec|1.1.1 node-modules-resolve|nau|1.0.2 uuid-generator|net|0.0.4 vetur|oct|0.21.0 gitconfig|sid|2.0.0 open-in-browser|tec|2.0.0 sort-lines|Tyr|1.8.0