sharkdp / fd

A simple, fast and user-friendly alternative to 'find'
Apache License 2.0
34.01k stars 817 forks source link

Negating patterns: add --exclude-regex? #198

Open alok opened 6 years ago

alok commented 6 years ago

I noticed grep has a -L flag to find filenames that don't contain the search pattern. What about the related operation of finding the complement of a pattern? Would a flag for that be useful, or is it there some simple way to specify it in the pattern regex?

sharkdp commented 6 years ago

Is there really a strong use case for this?

We already have the --exclude <pattern> option which lets you specify a glob pattern that should be excluded. While not exactly the same, I'm not sure if there is a need for an option that would invert the pattern.

or is it there some simple way to specify it in the pattern regex?

Not really. Negative lookaheads can be abused for that but (a) that's not really practical and (b) I don't think they are supported by the regex crate.

A workaround would be fd | grep -v pattern.

YodaEmbedding commented 4 years ago

Personally, I needed it to remove all files directory which did not fit a given criteria. For instance, deleting all files not starting with the prefix "keep":

fd '^(?!keep)' --exec rm {}

Even if fancy regexes aren't supported, an --invert option would probably help in most use cases:

fd 'keep' --invert --exec rm {}
sharkdp commented 4 years ago

@SicariusNoctis --exclude would work for your case as well:

fd -E 'keep*' -X rm
NightMachinery commented 3 years ago

@SicariusNoctis --exclude would work for your case as well:

fd -E 'keep*' -X rm

@sharkdp Why not add an --exclude-regex?

raiguard commented 3 years ago

I too would like an --exclude-regex. I am extremely familiar with regex, but not at all with glob patterns. And since fd does not support negative lookahead, it's impossible to do that kind of thing in a regex pattern.

sharkdp commented 3 years ago

Ok, reopening for now.

razcore-rad commented 1 year ago

I just ran into the issue of a lack of exclude patterns myself. I'm working on some Blender addon and I need to exclude just one __init__.py file from the root folder. I have to use something like: fdfind --type file | rg -v '^\./__init__.py' cause the glob pattern excludes all __init__.py from all folders.

tmccombs commented 1 year ago

@razcore-rad I'm pretty sure you could use --exclude=/__init__.py for that. (note that --exclude actually uses the same syntax as .gitignore).

razcore-rad commented 1 year ago

@razcore-rad I'm pretty sure you could use --exclude=/__init__.py for that. (note that --exclude actually uses the same syntax as .gitignore).

I see. That's handy. I looked at the man page but it didn't mention the specific syntax. I did try --exclude './__init__.py, but that didn't help, /__init__.py works for my use case, thanks.

tmccombs commented 1 year ago

It uses the same syntax as .gitignore

M1cha commented 1 year ago

ripgrep supports this as well within --glob:

Precede a glob with a ! to exclude it.

tmccombs commented 1 year ago

@M1cha fd already has an --exclude flag that uses globs. This issue is for excluding using a regex pattern rather than a glob pattern.

RensOliemans commented 4 months ago

Using globbing instead of regex for --exclude is enough for my use-case, so I am perfectly OK with this as is. However, I do find it a bit confusing, and am not the only one (see #1264). Would it make sense to change the behaviour of --exclude to use regex by default, and glob patterns with the --glob option? That would make --exclude consistent with the normal behaviour of fd.

This would be breaking behaviour though, so perhaps that's not acceptable.

tmccombs commented 4 months ago

I don't think we could do that. Not only is it a breaking change, but it would break things in a subtle way where some existing usages would work, some wouldn't work at all, and others would work sometimes.

Also, fwiw, the current --exclude option is designed to be consistent with entries in an ignore file.

aqdasak commented 4 months ago

--exclude-regex (or --invert-match in grep) Needed for My Use Case

I'm writing a script to calculate the total playtime of all videos in a directory recursively using fd. The files are named with serial numbers preceding them, for example:

./some_dir/1) filename.mp4
./some_dir/2) filename.mp4

When I want to list the files with serial numbers from 1 to 4, I use fd '^[1-4]\)', but fd | grep '^[1-4])' does not work in this scenario because the full path is passed from fd to grep and therefore ^[1-4]) matches the beginning of the full path and not filename.

However, when I need an inverted match, fd currently doesn't support this, and grep fails because it matches against the full path. I can't use basename in fd --exec basename {} \; | grep -v '^[1-4]\)' because I need the full path for another command ffprobe.

tavianator commented 4 months ago

@aqdasak For your use case you could use

$ fd | grep -E '(^|/)[1-4]\)'
$ fd | grep -Ev '(^|/)[1-4]\)'
aqdasak commented 4 months ago

@tavianator

The command $ fd | grep -E '(^|/)[1-4]\)' also matches "1) some_dir/" and all its content, which is not the intended outcome. The goal is to match only the filename and not its parent directories.

Currently, I'm using the following method (in fish shell):

for i in (fd)
       basename $i | grep -i '^[1-4])' >/dev/null && echo $i
end

and

for i in (fd)
       basename $i | grep -iv '^[1-4])' >/dev/null && echo $i
end
tavianator commented 4 months ago

Oh right, then something like fd | grep -E '(^|/)[1-4]\)[^/]*$'

NightMachinery commented 4 months ago

Using a second program will negate advantages such as coloring.

Alfamari commented 1 month ago

For me, this is the problem I run into unfortunately frequently enough for me to find this thread: I have a handful of nested subdirectories all of which are relatively small, while one being extremely huge. I try to exclude the huge directory by following the same rules as the default search syntax: smart-case regex. After it doesn't work I have to ctrl-c to cancel the vomit of unwanted matches, remember that it's a case-sensitive glob instead, read the directory name of the one I want to exclude more carefully, match the case sensitivity, and wrap my pattern in asterisks. All of which takes multiple attempts because I forget either the case-sensitivity or the asterisks.

Piping to another program has 2 disadvantages, as NightMachinery said you lose the coloring, and it also dramatically increases the search time because it has to match everything you want to exclude before you can actually exclude it.

I actually came here hoping there was an option to change the behavior of --exclude from a case-sensitive glob to a smart-case regex (it would be so nice for consistency). Now to avoid the breaking change of changing its default behavior across the board, what about an environment variable or flag that the user can manually choose to set in their alias (or if the config file ever becomes a thing) that would change the default --exclude behavior from case-sensitive glob to smart-case regex.

tmccombs commented 1 month ago

Now to avoid the breaking change of changing its default behavior across the board, what about an environment variable or flag that the user can manually choose to set in their alias (or if the config file ever becomes a thing) that would change the default --exclude behavior from case-sensitive glob to smart-case regex.

Unfortunately, that can break programmatic usages of fd. See discussion on https://github.com/sharkdp/fd/issues/362

Alfamari commented 1 month ago

Ya, I see it now for env variables but an alias flag should at least be fine :/.

Some projects have different commands for the same tool that provides extra/differing functionality. Broot has 'br' and zoxide has 'z' but the original commands can still be used. They also change things and aren't just a shorter invocation of the same thing. So an idea could be to somehow enable a different command to be called entirely that would allow users to more extremely deviate from the default behavior of fd without touching the original command. It could allow for a more obvious indication that this is being called interactively vs programmatically. That it would be more volatile with more users' unique preferences while allowing the original functionality of the fd command to remain consistent and intact. Because this concern of not messing with the original programmatic functionality seems to pop-up across different requests, maybe it could help address this in various areas.

At the very least, I do hope a standalone new flag can be added. If it's a dumb idea just ignore me. I just wanted to leave the idea out there in the unlikely case some ideas can be gained from it (maybe with enough ideas something will be appealing).

In these two examples they use shell integrations which sounds very much against this projects ideals of wanting a simple consistent result across OS's and probably not wanting to add a bunch of unique shell integrations for various platforms. So the exact implementation doesn't have to be the same, I'm just speaking more to the concept as a whole.

Edit: ehh, this basically sounds exactly like your (@tmccombs) suggestion already from the end of https://github.com/sharkdp/fd/issues/362#issuecomment-2081310265 and it got downvoted :(.