zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
https://zed.dev
Other
48.96k stars 2.95k forks source link

Items in `.gitignore`should still be searchable via file finder #4745

Open mifopen opened 1 year ago

mifopen commented 1 year ago

Check for existing issues

Describe the bug / provide steps to reproduce it

  1. Create file with name .env.local in the open directory
  2. Open file search pallete
  3. Search for env or local

Environment

Zed: v0.85.4 (stable) OS: macOS 13.3.1 Memory: 16 GiB Architecture: aarch64

If applicable, add mockups / screenshots to help explain present your vision of the feature

No response

If applicable, attach your ~/Library/Logs/Zed/Zed.log file to this issue.

If you only need the most recent lines, you can run the zed: open log command palette action to see the last 1000.

No response

JosephTLyons commented 1 year ago

Hey @mifopen, by chance, do you have the .env.local file .gitignored?

mifopen commented 1 year ago

@JosephTLyons yes, 100%

mifopen commented 1 year ago

And I see where it could come from. But I've been using JetBrains IDEs for a decade and believe that "having smth in .gitignore" doesn't equal "excluded from the global project search". It's a separate setting (you can mark directories/files as "excluded") in JB IDEs and it feels natural.

JosephTLyons commented 1 year ago

I think at the moment, that behavior is baked into zed on purpose, to prevent things like node_modules (and similar setups in other languages) from absolutely polluting the file finder results. That being said, we should probably consider some sort of setting here to give more flexibility into this:

mifopen commented 1 year ago

Totally agree

TwanLuttik commented 11 months ago

I agree as well but maybe add a option in the setting to override it.

abejfehr commented 9 months ago

Piggybacking to share this related behaviour: in VS Code, doing "Find in folder" and searching for a query works if a folder is ignored, but in Zed that doesn't seem to be the case.

Even if build/dist folders are grayed out in the editor, sometimes I want to make sure that something built correctly by searching for a string directly in an ignored folder.

Similar thing if I'm learning how a node_module works, sometimes I want to perform a scoped search in the node_modules folder even though it's ignored

fvsch commented 9 months ago

I have this issue as well. I understand that excluding all files matched in .gitignore is helpful to exclude directories with installed dependencies (node_modules, vendor, etc.), and temporary outputs like build artifacts and caches.

Still, there are gitignored files which are legitimately useful to be able to open quickly.

Some possible heuristics that could balance those two needs (at the cost of some complexity):

alex-astronomer commented 7 months ago

I have started work on this issue. The solution that I came up with is adding an "Ignore Included" toggle to the FileFinder picker modal.

image

Please leave comments if you have design feedback on this. I believe that a clean solution to this issue will involve different pickers each having their own search options. File Finder can have gitignore toggled for example, or project symbols could have case sensitivity toggled. That last example I just made up.

The reason that this is a clean solution is because each Picker has its own delegate for the different functions. We can modify the PickerDelegate trait in order to add optional search options for each type of picker in order to make this extensible and re-use components that are already written (SearchOptions).

Developers: I accept any and all feedback about design! Users: Let me know if this would solve the problems that you're facing.

SomeoneToIgnore commented 7 months ago

This is a relatively hard issue to tackle, if put in a generic form as "[any] items in .gitignore should still be searchable via file finder". The hardest part would be to keep the file finder searching very fast with all gitignored files knowledge (https://github.com/zed-industries/zed/issues/7504).

First of all, note_modules/, target/, foo_bar_output, .env or other files are all alike to Zed and it does not have any heuristics on their side.

Consider https://github.com/microsoft/vscode-eslint project as a web example, after installing the dependencies with npm i:

❯ find . -type f ! -path node_modules |wc -l
     870

❯ find . -type f |wc -l
    7882

❯ du -ha node_modules
........snip
 90M    node_modules

❯ du -ha .
........snip
104M    .

node_modules is a gitignored directory there, has by an order of magnitude more files than the real project and occupies ~90% of the project, size-wise.

Also note that this repo is a relatively small project that does not include Angular or React + somethingX + .. in its package.json.

State in Zed

Zed used to track all gitignored files at some point, but it was soon discovered that it becomes unresponsive relatively quickly due to the requirement to react on all related FS events and also [re]scan directories: https://github.com/zed-industries/zed/blob/e77d313839b382760a5d24c550e2ab795a1fac27/crates/worktree/src/worktree.rs#L3460

Now, Zed does scan only the non-ignored directories: https://github.com/zed-industries/zed/blob/e77d313839b382760a5d24c550e2ab795a1fac27/crates/worktree/src/worktree.rs#L2455 and expand_entry -> refresh_entries_for_paths -> forcibly_load_paths chain of actions in the worktree — so, all currently expanded gitignored directories will be added into the same collection of worktree entries: https://github.com/zed-industries/zed/blob/e77d313839b382760a5d24c550e2ab795a1fac27/crates/worktree/src/worktree.rs#L135-L136

and will be used when searching or displaying in the project panel, project search, file finder and various entry-related iterations.

This way, Zed only "indices" and uses gitignored files if they are in the worktree/"project" root (as it gets opened by default) and all other directories that were open (e.g. due to autonavigating in the project tree to the entry corresponding to the editor opened). This index is a core thing when it comes to addressing the files, so things might get slow if more entries will be added inside. Seems that there are some scaling issues with the current model already: https://github.com/zed-industries/zed/issues/8242 and adding more (10x at least) things on top is easy, if possible at all.

Zed does not use "gitignored files" concept too frequently: it shows a different icon in the project panel (ergo the whole "load gitignored directories via expand_entry call" story) and allows to do a project search on files + the gitignored ones. The project search part is done via a separate, background thread walking the gitignored tree roots and matching the files + there's a limit on the number of entries: https://github.com/zed-industries/zed/blob/e77d313839b382760a5d24c550e2ab795a1fac27/crates/project/src/project.rs#L6299-L6300

Design considerations

At first, we need to understand what to display and how. So far, it feels that there are certain people that expect Zed showing an arbitrary node_modules/foo/bar.ts file in file finder if queried, and some other set of people who will be happy with just their .env.* files from the project root opening in the file finder.

The latter are simple to fix with https://github.com/zed-industries/zed/pull/9760 or similar, but keeping the same design, how could former part be done? I currently think it's not really possible? As we cannot bloat the main cache with so many extra entries, but have to be able to answer fuzzy path queries over a 10x repository of files.

While traversing such file trees in realtime for fuzzy match queries does not sound possible, caching seems hard too due to invalidation? Current, worktree entry SumTree cache, tracks every related FS event for that which would not work here, so either some other strategy has to be picked, or a better way of approaching the problem considered — it seems rather wasteful to cache 10x of the repo size just to enable file finder queries.

Neither of the current editors known to me seem to provide any similar functionality, but VSCode has a "whitelist" of directories to track — we could solve the issue with yet another config thing but that would not be very discoverable and might still slow things down overall on large enough node_modules if we reuse the same worktree entry SumTree cache. On the bright side, with reasonable defaults (we can add all .env*-like files there explicitly) it will work for many people.

One idea that seems worth exploring is to add more interactivity into file finder and propose to input gitignored roots first to start looking them up: match regularly before receiving node_modules or whatever other gitignored root that was not scanned, then start to propose node_modules/* directories that match further file finder queries, e.g. for node_modules/pr, the picker will show first N directories with names starting with pr. Since the input operation is relatively tedious, some list completion would be needed + a background indexing task might index a subset and switch to regular, fuzzy matching mode. While might be a blast if implemented properly, sounds rather complicated (and would require more design than anything else above) and the "whitelist" + new file finder input toggle sound more feasible to do.

solventak commented 7 months ago

Ohhh interesting.... Thank you so much for replying so quickly to this issue. I will take a look through the (incredibly extensive) comment that you left and I'll let you know if I have additional questions.

CurbaiCode commented 7 months ago

6927 seems like it could be related to this. Also, #5029.

aarroisi commented 6 months ago

I think excluding files from .gitignore generally is a good approach. 95% of my time searching files, I don't want to search in the files / folders included in the .gitignore.

But I also frequently need to open files like .env that's not checked into git. So my proposed solution is a config that "forces" files /folders to be included in search, even though it's git ignored. Probably something like this:

{
  ...
  "always_include_in_search":
  [
    "**/.env"
  ],
  ...
}

So it acts sort of like a reverse gitignore, and can also use the same patterns used in .gitignore files for looking matched files / folders.

Hemant-Mann commented 4 months ago

I think the solution by @aarroisi should work.

I also need to search some files which are ignored in .gitignore like configurations or some internal dependency, giving a configuration option to the user would give them more control over what they want to search, the default behaviour is okay but we must have an option to override it if needed!

ottodevs commented 4 months ago

Another idea could be to reuse the already proven and mature .gitignore syntax so users are able to do things like:

"file_scan_exclusions": [
  "!.env*"
  // (...your other exclusions) 
]

Wherever the editor reads the .gitignore file, it would need to apply the file_scan_exclusions over it to come up with the final exclusions list. Even just concatenating both lists if that makes sense.

This brings great flexibility and avoids introducing new settings.

Examples:

  1. Empty setting for file_scan_exclusions to just use the .gitignore (although some sane defaults like the current ones seem nice to keep).

    {
     "file_scan_exclusions": []
    }
  2. Setting file_scan_exclusions to ["!.env*"] to negate current .gitignore settings.

    {
     "file_scan_exclusions": [
       "!.env*"
     ]
    }
  3. Add more files to the file_scan_exclusions to be excluded from the file scan apart from the ones in the .gitignore.

    {
     "file_scan_exclusions": [
       "*.tmp",
       "*.bak",
       "/logs/*",
       "!important.log"
     ]
    }
thnt commented 2 months ago

I think we just need to allow to search file by relative path, ex: ./.env.local to open ignored file

michaelaguiar commented 2 months ago

Another idea could be to reuse the already proven and mature .gitignore syntax so users are able to do things like:

"file_scan_exclusions": [
  "!.env*"
  // (...your other exclusions) 
]

Wherever the editor reads the .gitignore file, it would need to apply the file_scan_exclusions over it to come up with the final exclusions list. Even just concatenating both lists if that makes sense.

This brings great flexibility and avoids introducing new settings.

Examples:

  1. Empty setting for file_scan_exclusions to just use the .gitignore (although some sane defaults like the current ones seem nice to keep).
    {
     "file_scan_exclusions": []
    }
  2. Setting file_scan_exclusions to ["!.env*"] to negate current .gitignore settings.
    {
     "file_scan_exclusions": [
       "!.env*"
     ]
    }
  3. Add more files to the file_scan_exclusions to be excluded from the file scan apart from the ones in the .gitignore.
    {
     "file_scan_exclusions": [
       "*.tmp",
       "*.bak",
       "/logs/*",
       "!important.log"
     ]
    }

Has something like this been implemented? I am hoping for a quick way to open log files in the search project files dialog, that are currently ignored by git. This would be exactly what I need.

Tobbe commented 2 months ago

I think @abejfehr's comment is worth more attention.

If I explicitly list a .gitignored directory in my "include" filter I think it makes sense to override the exclusion of that directory and actually include files inside it when searching, even though it's part of my .gitignore