microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
163.58k stars 29.02k forks source link

Add `!`-prefix parsing to glob util #134415

Open JacksonKearl opened 3 years ago

JacksonKearl commented 3 years ago

It'd be potentially useful to have ! as a prefix added to our glob.ts utils. This could function similarly to how we already handle .gitignore files in the webworker search here:

  1. if a file matches a normal line and doesn't match an !-line, exclude it (we do this today)
  2. If it matches both a normal line and a !-line, include it
  3. Otherwise, include it (we do this today)

This isn't quite as powerful as a real.gitignore as there are only two "layers" of include/exclude, whereas in gitignore each line introduces a new "layer", but I believe it would handle the 99% of cases well enough, for instance every example given to-date in https://github.com/microsoft/vscode/issues/869.

To handle the many-layer case requires either:

cc @bpasero for any and all input on this

bpasero commented 3 years ago

Somewhat related: https://github.com/microsoft/vscode/issues/869

I think the biggest challenge is how to support conflicting includes / excludes. Maybe some inspiration could be taken from how search handles this where you can configure both include and exclude patterns at the same time.

JacksonKearl commented 3 years ago

Ah, so search is exclude-dominated, in search:

Versus what I described above and what our .gitignore parser does is more include-dominated:

Concretely, given the config:

"files.exclude": {
   "node_modules/**": true,
   "!node_modules/@types": true,
}

The Search approach would exclude all of node_modules, even the @types directory, because it matches the exclude, and include everything else. The .gitignore approach would include node_modules/@types because it matches the exclude, exclude the other entries in node_modules/** because they match the exclude, and include everything else.

On the other hand, given this config:

"files.exclude": {
   "foo.ts": true,
   "!*.ts": true,
}

The Search approach would exclude foo.ts, include the other *.ts files, and exclude everything else, because if an include is present all files must match the include The .gitignore approach would exclude foo.ts, and include everything else.

Not really sure what the best option is here. The Search strategy makes sense when you want to exclude the bulk of paths but include some specific ones, whereas the .gitignore strategy makes more sense when you want to include the bulk of paths but exclude some specific ones.

I lean towards using the .gitignore strategy because the search.exclude and files.exclude settings are historically more about excluding a specific set of files and including the rest (and the ask in #869 is to allow including files that match existing excludes), whereas search is more about including a specific set of files and excluding the rest (and we already support excluding files that match existing includes)

cc @roblourens for any thoughts here

roblourens commented 3 years ago

I would definitely not expect a negative exclude to work the same as an include. If I put *.ts in "files to include", I definitely don't want to see .js files, but a negative exclude for ts files is different.

The .gitignore approach would exclude foo.ts, and include everything else.

Does that mean that if I also have an exclude for *.js, it would be overridden by !*.ts? I can't remember how this works in gitignore except that I find it hard to get right.

Also, do you expect that I'd be able to have negated patterns in settings that override patterns in the gitignore? Last I looked into this area, ripgrep makes gitignore always take precedence.

bpasero commented 2 years ago

Hm, I think maybe this issue needs to split into two issues:

I am not sure the intent of this issue, so asking @JacksonKearl to clarify: is this about making glob patterns aware of a leading ! to negate the entire pattern? We already have limited support for ! in character ranges it seems:

https://github.com/microsoft/vscode/blob/e80a0ca648fe9bef07070e4f7fb04e67f2243252/src/vs/base/common/glob.ts#L464-L472

JacksonKearl commented 2 years ago

is this about making glob patterns aware of a leading ! to negate the entire pattern?

yes

bpasero commented 2 years ago

It is somewhat easy to add global negate support for glob, see https://github.com/microsoft/vscode/commit/efa5122689eea38e5c8ca4480f58ae58e97230e6 for a first stab at it.

However, there are a few questions:

[1]

Not sure the ripples or adoption cost where it is used, @roblourens in search. Does RipGrep even support these patterns?

[2]

Take this example:

let globExpression = {
    '!*.js': true,
    '!*.ts': true
};

Since glob expressions are basically a OR combination of all patterns, the above pattern will match on everything, even though it may look as if this will match every file that is not .js or .ts. Not sure how realistic that scenario is though, but wanted to bring it up anyway.

roblourens commented 2 years ago

Ripgrep supports them, I think the support there will be the same as in gitignore files

bpasero commented 2 years ago

So how is our glob util used in search component then? Are we even requiring it there or are we just passing on patterns to RipGrep and let it deal with it?

roblourens commented 2 years ago

We pass the patterns to ripgrep, and for editors that are open, we interpret the glob patterns. For patterns with sibling clauses, we interpret the patterns.

bpasero commented 2 years ago

After some discussion with Christoph, I concluded that support for ! to negate glob patterns does not work well with our support for glob expressions that have the form of pattern: enablement in an object.

I see no practical way how to parse an expression such as:

{
    "!workspace/*.js": true,
    "!workspace/*.ts": true,
    "workspace/*.html": true,
    "workspace/*.xml": true
}

Our normal algorithm is to report a match overall when any of the enabled patterns match. With negated patterns, that doesn't really work anymore because negated patterns match on almost everything and as such, 2 negated patterns in an expression will match everything.

Then my naive solution to this was to group all negated patterns together and require them ALL to match and then check the remaining patterns for matches, basically for the above example: <*.html> || <*.xml> || (<not *.js> && <not *.ts>).

Can someone come up with a meaningful expression that combines negated patterns and normal patterns that makes any sense with this rule? So far I could not come up with any example...

roblourens commented 2 years ago

From the gitignore manpage,

any matching file excluded by a previous pattern will become included again

In that example, I don't think the negated patterns should change the overall result since they don't intersect with the html/xml patterns. Another example would be like

{
  "foo/*": true,
  "bar/*": true,
  "!foo/file.txt": true
}

So I think you would check the positive patterns, then check the negative patterns, like (<foo/*> || <bar/*>) && <not foo/file.txt>

The : true feels confusing here too...

In gitignore, order matters, and I'm not sure whether we would run into trouble by assuming that the negated patterns are always resolved after the positive ones.

bpasero commented 2 years ago

If I understand correctly a path matches the expression when it matches any of the non-negated patterns and then all of the negated patterns?

In the example above:

That could work. It is probably not very intuitive because we do not really support processing the expression in an ordered way like .gitignore does...

bpasero commented 2 years ago

@JacksonKearl can you maybe chime in as stake holder, you mention it would solve https://github.com/microsoft/vscode/issues/869 so we should maybe look at examples and build this feature around the needs. If I understand correctly, users would configure files.exclude with the new support for negated patterns?

roblourens commented 2 years ago

If I understand correctly a path matches the expression when it matches any of the non-negated patterns and then all of the negated patterns?

I guess, but it would probably be worth looking at ripgrep or git source before we get bit by edge cases

JacksonKearl commented 2 years ago

If I understand correctly, users would configure files.exclude with the new support for negated patterns?

Yes exactly.

I guess, but it would probably be worth looking at ripgrep or git source before we get bit by edge cases

We mostly implement their logic here: https://github.com/microsoft/vscode/blob/2358283d76a0afdf57e60ec2fc76a57b1b0146c7/src/vs/workbench/services/search/common/ignoreFile.ts#L90 but it'd require some significant changes to get that working for excludes, for instance handling filtering specifically directories vs files and following parent pointers differently from how we merge the settings today. Technically we may need to handle ordering as well, but the code above doesn't, and that wouldn't really work with our object based model anyways.

Ideally, the following would work:

"files.exclude": {
    "node_modules/*": true,
    "!node_modules/@types": true,
    "node_modules/@types/*": true,
    "!node_modules/@types/*.js": true,
}

However that requires some understanding of the ordering that we may not want to rely on, especially given things like settings scope merging.

Absent that, it'd be helpful to have this:

"files.exclude": {
    "node_modules/*": true,
    "!node_modules/@types": true,
}

where if a file matches some bare expression and no negated expression, it matches the glob and is included. This way ordering isn't important, a simple 2 pass approach works. This is actually what we do already in the ignore file parser.

dbagley1 commented 1 year ago

Support for negate patterns is essential to fixing a bug with the "Explorer: Exclude Git Ignore" setting. Negate patterns are ignored when hiding files in the explore pane. As mentioned in #175066. That issue was marked as a duplicate of this issue and closed.

Thaina commented 4 months ago

I try to use

{
    ...,
    "files.exclude" : {
        "Library/" : true,
        ...
    },
    "search.exclude" : {
        "Library/PackageCache/**" : false,
        ...
    },
    ...
}

And it's not work

This below work for some reason

{
    ...,
    "files.exclude" : {
        "Library/[!(PackageCache)]*/": true,
        "library/[!(PackageCache)]*/": true,
        "Library/PackageCache/": true,
        "library/PackageCache/": true,
        ...
    },
    "search.exclude" : {
        "Library/PackageCache/": false,
        "library/PackageCache/": false
    },
    ...
}

Also though it seem the negate syntax only work with one character only