sharkdp / fd

A simple, fast and user-friendly alternative to 'find'
Apache License 2.0
33.8k stars 808 forks source link

`--exclude` doesn't work with absolute paths #851

Open morganmay opened 3 years ago

morganmay commented 3 years ago

Describe the bug you encountered:

The examples for the --exclude or -E option imply that it should work with absolute paths (/mnt/external-drive is given as an example). However, it only seems to work with relative paths. For example, if I'm trying to exclude the directory /home/user/Library/:

fd -E Library pattern /home

works, as does

fd -E "*/Library/*" pattern /home

However,

fd -E /home/user/Library/ pattern /home

doesn't work (i.e. /home/user/Library/pattern.txt would still show up in search results ). Adding other options, such as -p or -a doesn't seem to affect this behavior.

The only way I've found to exclude absolute paths is to add them to ~/.config/fd/ignore, which is sowewhat inconvenient.

What version of fd are you using?

fd 8.2.1

Which operating system / distribution are you on?

Linux 5.14.2-arch1-2 x86_64
LSB Version:    1.4
Distributor ID: Arch
Description:    Arch Linux
Release:        rolling
Codename:       n/a
alessandroasm commented 3 years ago

Hi! I would like to work on this. :)

I'm gonna try to fix it and open a PR.

alessandroasm commented 3 years ago

The issue here is as follows: the exclude option works the same way .gitignore patterns work. This means that an absolute path is relative to the root of the git repo (which is the first search path in our case).

To fix this, we can check which exclude options are absolute and filter the results after crate ignore finds them. What do you think about this approach? @sharkdp

andrejp88 commented 2 years ago

Just ran into this problem. Thanks for the tip, @alessandroasm. In my case I made the excluded paths relative to the root folder, and it worked. Perhaps the man page could be updated to note that the flag follows the same rules as ignore entries.

diktomat commented 2 years ago

Especially bad in combination with --follow, as ~/Library/Containers (Mac) contains thousands of symlinks to directories like ~/Pictures or ~/Music, which themselves can have tens of thousands of files in it. Blows up search results a lot, >8x time and >14x result count for me:

~
❯ fd --exclude Containers --follow |wc -l
 2657140

~ took 8s 
❯ fd --follow |wc -l
 38664223

~ took 1m10s 
❯ 

I don't really want to add plain Containers into my global ignore, as it's a name that may be used outside ~/Library (for, well, containers for example), which should not be excluded.

My current approach, for everyone wanting sth similar, is this rather granular global ignore, which allows me to find files living in these containers (sandboxed apps' documents) while not blowing up completely:

# Source:
~/Library/Containers 
❯ fd --type symlink |cut -d '/' -f 4 |sort |uniq

# $XDG_CONFIG_HOME/fd/ignore
Library/Containers/*/Data/Desktop
Library/Containers/*/Data/Downloads
Library/Containers/*/Data/Library
Library/Containers/*/Data/Movies
Library/Containers/*/Data/Music
Library/Containers/*/Data/Pictures

# Result:
~ 
❯ fd --follow |wc -l
 2702849

~ took 9s
❯ 
cyqsimon commented 2 years ago

@alessandroasm any progress on this? If you've encountered any difficulty or cannot spare the time, I am willing and able to help.

alessandroasm commented 2 years ago

Hello cyqsimon, sadly I'm way too busy at this time, so I could not get any progress on this. Fell free to work on it if you want :)

On Fri, Aug 5, 2022 at 7:29 AM cyqsimon @.***> wrote:

@alessandroasm https://github.com/alessandroasm any progress on this? If you've encountered any difficulty or cannot spare the time, I am willing and able to help.

— Reply to this email directly, view it on GitHub https://github.com/sharkdp/fd/issues/851#issuecomment-1206343544, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWVKXYWDAAZDXH7J3XJUODVXT3LFANCNFSM5D3VSVZA . You are receiving this because you were mentioned.Message ID: @.***>

SoftwareApe commented 1 year ago

I would have liked this feature, too. If it's a performance or compatibility concern we could have an --exclude-abs option, that would then do a check if it's a file in the current search directory.

tmccombs commented 1 year ago

It's more of a "the library we use for this doesn't really support this". So we would have to find a way to work around that, or stop using that library. See https://github.com/BurntSushi/ripgrep/issues/2366

cyqsimon commented 1 year ago

I finally have some time to come back to this issue.

From reading https://github.com/sharkdp/fd/blob/master/src/walk.rs, it seems like there is no good way to implement an "ignore by absolute path" mechanism within the confines of the ignore crate. And I think BurntSushi does make some good points in https://github.com/BurntSushi/ripgrep/issues/2366#issuecomment-1336399045 on why it's a "wont-fix", in particular the non-trivial performance impact such a feature will incur.

So considering the performance impact, would it make some sense to split "absolute ignore" into its own flag, and implement it independently of what's offered by ignore? Something like --exclude-absolute maybe (and the corresponding global config file ~/.config/fd/ignore-absolute)? And then in documentation we can inform the user very explicitly about the performance impact it entails.

As of the specific implementation, I imagine it won't be too difficult (if some performance penalty is acceptable). In fd::walk::spawn_senders, simply canonicalise the current path (which is where most of the penalty is going to come from), and then use globset to match. I'll make sure the canonicalization doesn't happen if the user hasn't specified anything via --ignore-absolute so that there's no performance regression if the user doesn't use this new functionality. Further optimisations are going to be much more difficult I think, but at least the option to use it is there.

I'll quickly put together a prototype to test. Any ideas/suggestions are welcomed!

musjj commented 1 year ago

Another problem related to this is that --exclude seems to use some kind of fuzzy matching. Given a directory like this:

.
├── directory
│   ├── exclude-me
│   └── just-some-file
└── exclude-me

There's no way to exclude just the exclude-me that is in the root directory:

❯ fd --exclude exclude-me
directory/
directory/just-some-file

EDIT: Never mind, my use case does not require any additional features. Pre-pending the pattern with a slash anchors it to the root directory:

❯ fd --exclude /exclude-me
directory/
directory/exclude-me
directory/just-some-file
elig0n commented 5 months ago

EDIT: Never mind, my use case does not require any additional features. Pre-pending the pattern with a slash anchors it to the root directory:

❯ fd --exclude /exclude-me
directory/
directory/exclude-me
directory/just-some-file

So using relative paths with prepended slash solves this issue. Mods can close this thread then.

different-name commented 3 months ago

So using relative paths with prepended slash solves this issue. Mods can close this thread then.

No, we still can't use absolute paths The / in this case is relative to the directory being searched

different-name commented 3 months ago

In my case, I want to search multiple directories and exclude specific directories, I don't think this is possible with fd currently