microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
164.89k stars 29.52k forks source link

Revamp findFiles #48674

Open TomasHubelbauer opened 6 years ago

TomasHubelbauer commented 6 years ago

Hello, findFiles's exclude argument says that if you pass in undefined, default excludes will apply.

I have found this to not work in my extension despite my configuration being correct. Consider this piece of code:

const files = await workspace.findFiles('**/*.md'); // https://github.com/Microsoft/vscode/issues/47645
// TODO: Figure out https://github.com/Microsoft/vscode/issues/48674
console.log(workspace.getConfiguration('search.exclude'));
for (const file of files) {
    console.log(file.fsPath);
}

Above, workspace.getConfiguration('search.exclude') returns:

{
    "**/node_modules": true,
    "**/bower_components": true
  }

This is merged from the default settings, I never change this configuration section.

Despite that, the workspace.findFiles('**/*.md') (with or without undefined explicitly passed as a 2nd argument, shouldn't make a difference unless some weird parameter counting is going on) returns MarkDown files from my node_modules directory.

I do not know how to run extension debugging in a "none but this extension" mode, but I installed a published version of my extension and tried installing it alone in my Insiders instance and was able to reproduce the problem.

I am using VS Code 1.22.2 and VS Code Insiders 1.23.0-insider`.

Is there anything else I can do to debug this further?

roblourens commented 6 years ago

This uses files.exclude, but not search.exclude. @jrieken maybe docs should mention this?

TomasHubelbauer commented 6 years ago

Indeed in my case console.log(workspace.getConfiguration('files.exclude')) returns (the default):

{
    "**/.git": true,
    "**/.svn": true,
    "**/.hg": true,
    "**/CVS": true,
    "**/.DS_Store": true
  }

How can I turn the return value of workspace.getConfiguration('search.exclude') into something findFiles will accept? I came up with this:

// TODO: https://github.com/Microsoft/vscode/issues/48674
const excludes = await workspace.getConfiguration('search').get('exclude')!;
const globs = Object.keys(excludes).map(exclude => new RelativePattern(workspace.workspaceFolders![0], exclude));
const occurences: { [fsPath: string]: number; } = {};
for (const glob of globs) {
    // TODO: https://github.com/Microsoft/vscode/issues/47645
    for (const file of await workspace.findFiles('**/*.md', glob)) {
        occurences[file.fsPath] = (occurences[file.fsPath] || 0) + 1;
    }
}

// Accept only files found in results of searching of all exclude globs
const files = Object.keys(occurences).filter(fsPath => occurences[fsPath] === globs.length);

This feels hardly optimal. If there is a better way to do this, please, let me know, I'd much rather use something built-in if there's anything. I also tried merging the patterns into something like **/*.md, !**/node_modules or **/*.md; !**/node_modules, but that crashes on an invalid use of a glob ** pattern.

roblourens commented 6 years ago

You can combine multiple patterns into a single one by comma-separating inside {}. https://github.com/Microsoft/vscode/issues/38545

You can't combine include and exclude with !

And one thing to watch out for is that the exclude settings can have sibling patterns as their values, rather than true.

For example,

"files.exclude": {
    "build/**/*.js": {
        "when": "$(basename).ts"
    }
},

And the findFiles interface doesn't support those. So you would want to think about how you want to handle that.

TomasHubelbauer commented 6 years ago

@roblourens Can you please reopen this as a feature request for a better API for findFiles with a simple way of telling it to use excludes from getConfiguration?

chrmarti commented 6 years ago

This would also be helpful in our own code: https://github.com/Microsoft/vscode/commit/b845ce918f2689344795e845ae6ef691916c3079

roblourens commented 6 years ago

After https://github.com/Microsoft/vscode/issues/34711, workspaceContains patterns respect search.exclude. Quickopen does too. So I feel like findFiles should as well. But since it's a breaking change, I think it will be hard to do...

TomasHubelbauer commented 6 years ago

Could that be resolved by adding a new optional flag to it that would toggle this behavior on? Obviously I'd love to see it be the default behavior for this, so if it is possible to survey extension authors and assess the impact, it would be cool to take on a BC, but if that's not feasible, a flag could be an alternative.

TomasHubelbauer commented 6 years ago

Here's an updated snippet which will return URIs of workspace files except for those ignored either by search.exclude or files.exclude or .gitignore:

import { workspace, Uri } from "vscode";
import { exec } from 'child_process';
import applicationInsights from './telemetry';
import { join } from "path";

// TODO: https://github.com/TomasHubelbauer/vscode-extension-findFilesWithExcludes
// TODO: https://github.com/Microsoft/vscode/issues/48674 for finding MarkDown files that VS Code considers not ignored
// TODO: https://github.com/Microsoft/vscode/issues/47645 for finding MarkDown files no matter the extension (VS Code language to extension)
// TODO: https://github.com/Microsoft/vscode/issues/11838 for maybe telling if file is MarkDown using an API
// TODO: https://github.com/Microsoft/vscode/blob/release/1.27/extensions/git/src/api/git.d.ts instead of Git shell if possible
export default async function findNonIgnoredFiles(pattern: string, checkGitIgnore = true) {
  const exclude = [
    ...Object.keys(await workspace.getConfiguration('search', null).get('exclude') || {}),
    ...Object.keys(await workspace.getConfiguration('files', null).get('exclude') || {})
  ].join(',');

  const uris = await workspace.findFiles(pattern, `{${exclude}}`);
  if (!checkGitIgnore) {
    return uris;
  }

  const workspaceRelativePaths = uris.map(uri => workspace.asRelativePath(uri, false));
  for (const workspaceDirectory of workspace.workspaceFolders!) {
    const workspaceDirectoryPath = workspaceDirectory.uri.fsPath;
    try {
      const { stdout, stderr } = await new Promise<{ stdout: string, stderr: string }>((resolve, reject) => {
        exec(
          `git check-ignore ${workspaceRelativePaths.join(' ')}`,
          { cwd: workspaceDirectoryPath },
          // https://git-scm.com/docs/git-check-ignore#_exit_status
          (error: Error & { code?: 0 | 1 | 128 }, stdout, stderr) => {
            if (error && (error.code !== 0 && error.code !== 1)) {
              reject(error);
              return;
            }

            resolve({ stdout, stderr });
          },
        );
      });

      if (stderr) {
        throw new Error(stderr);
      }

      for (const relativePath of stdout.split('\n')) {
        const uri = Uri.file(join(workspaceDirectoryPath, relativePath.slice(1, -1) /* Remove quotes */));
        const index = uris.findIndex(u => u.fsPath === uri.fsPath);
        if (index > -1) {
          uris.splice(index, 1);
        }
      }
    } catch (error) {
      applicationInsights.sendTelemetryEvent('findNonIgnoredFiles-git-exec-error');
    }
  }

  return uris;
}

I would much rather VS Code extension API provided this out of the box, but alas. I am noticing some new development like FileSearchProvider which natively allows ignoring Git ignored files, but it doesn't seem like this will trickle down to the good ol' findFiles anytime soon.

Thus the code above should be the best we have for now unless there is some way to use the Git extension API to query the ignored-or-not status of a URI without using exec.

@joaomoreno is this API usable from other extensions or is that something internal to only VS Code? If it is, can you give an example of how an extension author could make use of it? I wasn't able to find anything on the topic.

joaomoreno commented 6 years ago

Here's the API: https://github.com/Microsoft/vscode/blob/master/extensions/git/src/api/git.d.ts

Here's usage example: https://github.com/Microsoft/vscode-pull-request-github/blob/master/src/extension.ts#L53

It is still pioneering work, so there's no official documentation yet. We also do not yet expose the ignored status for files. Tho we could. Feel free to file an feature request issue for it.

roblourens commented 6 years ago

At some point we may want a findFiles2 that takes an options object like findTextInFiles instead of a list of params. Exposing the useIgnoreFiles option makes sense for findFiles, it's exposed on findTextInFiles.

Khady commented 4 years ago

Is there any progress or recommended solution for this issue? When writing a task provider and working with a mono repository it is really necessary to have the proper exclusion patterns. Or in projects where the compilation artifacts are somewhere in the workspace.

TheButlah commented 4 years ago

This is enough of an API problem that many large extensions fail to consider search excludes as a result. Even micorosoft's python extension has this problem - I get OOM errors because they use findFiles in the expected way (without providing excludes), and that causes a massive search through the whole workspace.

I think given that this is several years old and so many plugins are using it, it really needs to have a new optional argument to exclude search.excludes.

See https://github.com/microsoft/vscode-python/issues/11173 for the example of it causing downstream problems See the api usage that leads to this (spoiler alert, its just the default, straightforward usage): https://github.com/microsoft/vscode-python/blob/bd8d32bccbf9b459981db18c837ee68414335cf4/src/client/datascience/interactive-ipynb/nativeEditorProvider.ts#L91

roblourens commented 3 years ago

We discussed implementing an alternate signature for findFiles that takes a set of options, more like how findTextInFiles was designed. Both should have an explicit option for using both sets of exclude options, and the default should be to use them.

Example:

export interface FindInFilesOptions {
    /**
     * A {@link GlobPattern glob pattern} that defines the files to search for. The glob pattern
     * will be matched against the file paths of files relative to their workspace. Use a {@link RelativePattern relative pattern}
     * to restrict the search results to a {@link WorkspaceFolder workspace folder}.
     */
    include?: GlobPattern;

    /**
     * A {@link GlobPattern glob pattern} that defines files and folders to exclude. The glob pattern
     * will be matched against the file paths of resulting matches relative to their workspace. When `undefined`, default excludes will
     * apply.
     */
    exclude?: GlobPattern;

    /**
     * Whether to use the default and user-configured values from the `search.exclude` setting. Defaults to true.
     */
    useSearchExcludes?: boolean;

    /**
     * Whether to use the default and user-configured values from the `files.exclude` setting. Defaults to true.
     */
    useFilesExcludes?: boolean;

    /**
     * The maximum number of results to search for
     */
    maxResults?: number;

    /**
     * Whether external files that exclude files, like .gitignore, should be respected.
     * See the vscode setting `"search.useIgnoreFiles"`.
     */
    useIgnoreFiles?: boolean;

    /**
     * Whether global files that exclude files, like .gitignore, should be respected.
     * See the vscode setting `"search.useGlobalIgnoreFiles"`.
     */
    useGlobalIgnoreFiles?: boolean;

    /**
     * Whether symlinks should be followed while searching.
     * See the vscode setting `"search.followSymlinks"`.
     */
    followSymlinks?: boolean;

    /**
     * Interpret files using this encoding.
     * See the vscode setting `"files.encoding"`
     */
    encoding?: string;
}
andreamah commented 9 months ago

I recently introduced a new FindFiles2 API, which is currently proposed. The work is tracked here https://github.com/microsoft/vscode/issues/204657. Please try it out and let me know if there is anything missing!

phil294 commented 5 months ago

Just tried it @andreamah and it works great! Being able to exclude gitignored files is a HUGE help. Looking forward to getting this officially shipped with mainline VSCode!

For now, I've written a fallback that first scans the workspace for .gitignore files, reads and parses them and then adds those to the custom exclude string for findFiles()(v1). I have found this to be much faster than querying for git check-index like @TomasHubelbauer did (thanks for sharing), albeit probably some complicated patterns might not be translated perfectly. Some translation is necessary after all, the syntax differs between Git ignores and VSCode globs significantly. Altogether it's still much slower than findFiles2().

https://github.com/phil294/search-plus-plus-vscode-extension/blob/e11a223b1a0ead7f177b56b1712b2aaed36d6335/src/extension.js#L181-L241

andreamah commented 4 months ago

Some news: in case you missed it, I'm finalizing the FindFiles2 (which will become an overload of workspace.FindFiles) API with some other related APIs. You can see the status (plus a diagram) here. Since they're so intertwined, I felt that it made the most sense to me to address all the APIs as a group.

andreamah commented 4 months ago

The (mostly-finalized-but-still-under-discussion) revamped API shape for this is now available in this file.

It will eventually have the New mentions removed when it gets merged with the existing proposed API. Then, once it becomes an overload of workspace.findFiles, it will just be called using workspace.findFiles (but with different args than the existing call).

phil294 commented 4 months ago

@andreamah this looks great, but I have one question about useIgnoreFiles: It says

  • May default to search.useIgnoreFiles setting if not set.

but that was never the case before, so isn't this about to become a breaking change? (depending on what and when this "May" will apply)

Not that it affects me, just curious ☺️

andreamah commented 4 months ago

This will just be an alternative call for workspace.findFiles, but the old one will still work and be valid.