sharkdp / fd

A simple, fast and user-friendly alternative to 'find'
Apache License 2.0
33.9k stars 809 forks source link

Discussion: show Git-ignored files by default? #612

Open sharkdp opened 4 years ago

sharkdp commented 4 years ago

Since fd was first published, the feature to hide Git-ignored files by default has always been controversial. It's the number one pitfall for new users, as witnessed by the numerous issues that have been opened over time (even though this is the first point in the Troubleshooting section). Even experienced users will likely run into this from time to time.

We have had past discussions about this (see #179, #220, #18), but I'm not so sure anymore if this default is the best possible option for the "average user".

I thought it might make sense to discuss this again and see what others think. Whatever we choose as the default, it will always be easy for users to select a different default via an alias.

Pro current behavior (do not show .gitignored entries by default):

Cons:

andreavaccari commented 2 years ago

I also like @tavianator proposal, possibly with the following caveats:

(Rerun with `-u` to also search ignored files, `-uu` to search all files)
dylan-tock commented 1 year ago

In case it helps, I thought fd was broken while searching for something I knew was in my node_modules directory due to this.

Ditto for me looking for an installed gem in $HOME/.rbenv

anacrolix commented 1 year ago

I believe the default should be changed. Similarly showing errors should be the default. Normally those things are suppressed when you have special needs or expectations with a search. When you run fd in frustration to work out where things are, you don't want things hidden from you.

stevenwalton commented 10 months ago

I want to add that I think this is a bad default and there are bad premises, probably based on the type of developer you are, not developers in general.

Most of the time, .gitignored results are not "interesting" to the user

Most of the time files in .gitignore are the most interesting ones and exactly what I want to be searching for. If you work with compiled languages, systems with logs, or tasks that produce binaries then you are highly interested in these. You want to search your logs (and fd with fzf makes this great). For tasks with binaries you often want to confirm their creation or that they were accurately cleared. I want to search logs and have better tools than grep and be able to find these across swaths of directories (work HPC or ML and you'll be doing this a lot). But neither of these would be things you would push upstream and are going to be in your gitignore.

The question of "interesting" depends on what kind of developer you are and the default should be the case with the least assumptions made.

It's the number one pitfall for new users, as witnessed by the numerous issues that have been opened over time (even though this is the first point in the Troubleshooting section). Even experienced users will likely run into this from time to time.

This assumes that every user will read the documentation. I understand the frustration, I really do. But people are not going to hear about fd from the github but their friends, blogs, reddit, whatever. They'll do apt get install fd-find and are told this is a fancy alternative to find. The unfortunate truth is that you could plaster this warning in big bold flashing letters at the top of the github page and you'd still have frequent user confusion. You can only directly control so much and you can't force users to read documentation just as you can't expect users to read a pull/push sign.

A simple, fast and user-friendly alternative to 'find'

The way it is advertised gives a reasonable user interpretation of expecting behavior to be like find. I understand that you've even updated the feature list to note this ignoring of files. BUT I think no matter how clear you make this, it is generally going to be unseen. The flag is no doubt useful just as I am a frequent user of the exclude glob patterns.

An additional and critical point I want to make is that we end up getting different behavior based on where we issue our find command due to changing gitignores. If I run this in the project directory I will get different results than if I run it in a different project directory that is a clone of all the non-git files. This is highly unexpected behavior and will rightfully make a user feel schizophrenic.

Overall the question comes down to which is the worst case:

  1. A user can't find a file they expected to because it is captured by .gitignore
  2. A user gets additional output

I think 1 is a obviously a significantly worse situation. There's an adage in experimental sciences that applies: it is better to record too much information than too little. This is because too little has (generally) far higher costs than not enough (which requires doing everything over because you didn't know everything you'd need beforehand).

I write this message not to hate on this project but for the exact opposite reason. I absolutely love it, but I also want to see it become the best tool it can be.

matu3ba commented 10 months ago

A user gets additional output

I do consider finding additional potential unwanted things differently from ripgrep a bigger problem than some missing ones. As example, if one are deletes files or probes untrusted files, one does not want divergence of tooling for unwanted surprises.

Personally, I do use fd intentionally, because it is able to use expressed intent by git repo authors, for example to manage my dotfiles and it feels kinda weird not to do. Personally I'd prefer one tool combining ripgrep and fd to make things simple and not learning yet another arbitrary default, but I do recognize that this is unfeasible to maintain.

tmccombs commented 10 months ago

If you work with compiled languages, systems with logs, or tasks that produce binaries then you are highly interested in these.

I do work with compiled languages, systems with logs, and tasks that produce binaries. And when I use fd, I usually don't want to include those things. I want to be able to search for a source file, without showing files that are generated from it. Indeed, one of the biggest attractions of fd when I first started using it was that it respected gitignore by default. Although I will admit it isn't quite as important to me as it is for ripgrep.

Unfortunately, it is hard to get accurate data on how many users would prefer one default over the other. We will get a bias in responses to this issue from people who don't like the current default. But I strongly suspect that if we changed it we would get a lot of complaints from people who want it changed back.

For interactive use, I'm starting to think having some way to configure defaults would be worthwhile. But I worry about the impact that would have on startup time, especially if we parse a config file. And for scripting, having the behavior depend on a users preferences is... not great. For the latter, perhaps we could mitigate with a flag that puts fd in "scripting mode" where we ignore whatever mechanism we use for configuration.

stevenwalton commented 10 months ago

rg is not part of the standard command set and isn't really relevant to this conversation.

My argument isn't I (Steven) have this particular use case vs you (tmccombs or matu3ba) have a particular use case. I just gave those as an example.

My argument is "which default yields the lowest entropy"

The reasoning to my argument is "follow same set of defaults as the standard system."

Personally, I'm just throwing alias fd=fd --no-ignore into my rc and calling it a day. From a design perspective, I strongly believe more confusion is created by a default that excludes files that one would produce via standard commands such find, ls, grep, locate, and so on. We're talking about default options. If you're using rg I assume you can be like me and throw alias fd=fd --git into your rc and call it a day too. The question is not "what do you find useful" but "what behavior is most expected from a new user". Let's just make sure we get the framing right and let's also not forget that alias exists. I mean we all have dotfiles, right?

The frequent issues are explicit evidence that such default behavior does create huge surprisal to users. So is the fact that it's in the first line of the documentation and in the feature list. When it doesn't, those users probably read the documentation closely. If a user reads the documentation closely they can easily throw in an alias into their rc file (because that's what those files are for, personalization) and go on about their day and the github issues will go away. We can even think about this from another perspective if the several I have given aren't enough. Which would be a larger breaking change: if the default is to ignore the *foo glob and you remove that default filter or the default is to have no filter and you introduce a *foo glob. Obviously the latter results in a higher surprisal to the user.

The arguments for the default filter are arguments of personal use case, which is why I said the desired behavior depends on what type of developer you are. Providing customization options are fantastic and I'm super happy fd has these. That still doesn't change the issue that the current default creates higher surprisal. If you want to convince the --no-ignore crowd that we're wrong you have to convince us that this default creates less surprisal.

You're not going to go over to exa or lsd and find tons of issues "command outputs files that pattern match gitignore, this is unexpected behavior." It would be silly to think so and that's why it feels weird to even be having this discussion. I am surprised that you are surprised that people are surprised that the "better find" tool filters out files that aren't hidden.

anacrolix commented 10 months ago

I am running with alias fd='fd --show-errors --no-ignore --glob --hidden'. Why regex is also the default when . has special meaning in regex and globbing are so common is beyond me.

kpym commented 10 months ago

IMO, all of this discussion about what are the best defaults points to the following conclusions:

My personal opinion on what are the "best" defaults should be discussed from a newbie's point of view. More experienced users will know how to tweak the tool to their own needs.

sharkdp commented 10 months ago

Overall the question comes down to which is the worst case:

  1. A user can't find a file they expected to because it is captured by .gitignore

  2. A user gets additional output

I think 1 is a obviously a significantly worse situation. There's an adage in experimental sciences that applies: it is better to record too much information than too little. This is because too little has (generally) far higher costs than not enough (which requires doing everything over because you didn't know everything you'd need beforehand).

I think this is a really good point and I am seriously considering a switch of the default behavior in fd version 9. This would be a major breaking change. I know for a fact that people are using fd in scripts and pipelines. They will have to adapt (check) their code when upgrading to fd 9.

One practical problem is that we have a set of (short) command-line options that are designed to work with the current default, like -I/--no-ignore. We have a (somewhat hidden) --ignore counterpart, but no suitable short option. We would also have to figure out what to do with --no-ignore-vcs, --no-ignore-parent, unrestricted, etc.

stevenwalton commented 10 months ago

This definitely is not an easy thing to solve, but I do think it is important given that surprise is the important aspect of defaults.

Maybe a minor update can be pushed to give a deprecation warning before any changes are made? If you do go ahead with making the change, and I really do want to see that, I'd try to get out the deprecation warning sooner than later.

For the --no-ignore* flags, I think they should stay for a bit but just be non-operations (later you can leave but remove from man because it is just legacy support and add a deprecation flag here saying no longer needed). For the ignore counterparts, would the exclude term not make sense? I like @kpym 's suggestion of the --git flag or --git-ignore or --ignore-git for better pattern consistency. I don't think short flags are always necessary (I actually encourage people to use the log flags for aliases or scripts because they are better documentation), but I definitely get the desire. But there's only so many options. -I makes the most sense for ignore but I think the change would have to be pushed down the line after sufficient deprecation warning because a reversal in behavior runs counter to the whole premise of least surprisal. I'm honestly not sure what's a good solution to this but I do think much can be non-breaking to the majority of users while helping people onboard. I also think community opinions are good for best options to these so that there can be the least surprisal but I think it is important that things stay in the spirit of find to keep along those lines and try to balance the echo chambers that can happen. You definitely have my sympathy haha (and again, I really appreciate the work you did creating fd, and bat. Love them both)

tmccombs commented 10 months ago

I'm not entirely opposed to a change in the default, as long as there is an easy way for users to keep the current behavior if they want. Which could be as simple as being able to do alias fd="fd --ignore-vcs" (or --ignore-git), as long as I can still use --no-ignore-vcs, -u, -I, etc. to turn off the previous --ignore-vcs (which is how it currently works).

One practical problem is that we have a set of (short) command-line options that are designed to work with the current default, like -I/--no-ignore

That brings up the question, should the new default be the equivalent of current fd --no-ignore, or fd --no-ignore-vcs?

Personally, I think it would be a little surprising if fd doesn't respect .fdignore files by default. .ignore is more questionable. OTOH, in the case that you don't use any ignore files, bypassing the ignore machinery could improve performance.

If we changed the default to fd --no-ignore-vcs, then the -I, --no-ignore option would still be meaningful, since it excludes the .fdignore and .ignore files. Although perhaps not needed quite as often.

We have a (somewhat hidden) --ignore counterpart, but no suitable short option.

For the long option, I think we would probably reverse the importance in the documentation (although maybe make the --no-ignore more prominent than it currently is).

As for the short option, that depends on what direction we went with for --no-ignore vs --no-ignore-vcs as the default.

If we went with --no-ignore as the default, -i would be a good choice as an alias for --ignore, except that it is already taken for "case-insensitive", but maybe we could change that, although that increases the potential breakage. Or we could invert the meaning of -I, which also would increase the scope of the breakage, and would be inconvenient for anyone who aliases fd to fd --ignore, since there isn't a short option to re-disable it, but maybe we could add a new short option for that as well. Or we could do something like -I means --ignore and +I means --no-ignore, but I don't think clap supports that convention, and it isn't a terribly common convention for CLIs.

If we went with --no-ignore-vcs as the default, there isn't currently a short option for --no-ignore-vcs, but it might be worth adding a short --ignore-vcs, perhaps -G for git? Although if we ever added support for additional VCSs that would make less sense. -v is currently available, but I worry about that being confused for "version" or "verbose" (and possibly we would want to use that for an alias to --verbose at some point?).

We would also have to figure out what to do with --no-ignore-vcs, --no-ignore-parent, unrestricted, etc.

I think those could probably stay the same as they are. Although make --ignore-vcs the main option documented instead of --no-ignore-vcs.

Maybe a minor update can be pushed to give a deprecation warning before any changes are made?

Where/when would we show this deprecation warning? Every time fd ran without a --no-ignore(-vcs) flag? That would be incredibly annoying IMO.

For the --no-ignore* flags, I think they should stay for a bit but just be non-operations (later you can leave but remove from man because it is just legacy support and add a deprecation flag here saying no longer needed)

No, we should keep them. Because I think we should support the use case of using an alias (or wrapper script) that passes --ignore-*, but allow negating it by --no-ignore-* later in the arg list. Just as we currently allow passing --ignore to undo a previous --no-ignore.

I don't think short flags are always necessary (I actually encourage people to use the log flags for aliases or scripts because they are better documentation), but I definitely get the desire

For scripts or aliases, I absolutely agree. However, for interactive use, I think that having short names for commonly used options is very valuable. And I think that turning the ignore functionality back on would be a pretty common usage, at least for me.

rg is not part of the standard command set and isn't really relevant to this conversation.

fd uses the same code for determining which files to ignore as rg. Some of fd's options were designed specifically to match options in rg. I generally view fd being to find what rg is to grep. And I strongly suspect that there is a large overlap between users of rg and users of fd. I do think it is relevant to the conversation. Maybe for searching for files based on their names, respecting .gitignore is less important than it is for ripgrep. But if so, I think it is worth asking why that is.

stevenwalton commented 10 months ago

I think ignoring a .fdignore is perfectly acceptable as the default, and probably even good. Since these are files specifically for fd, just like it is perfectly reasonable for git to ignore .gitignore files by default. It's kinda self documentation. fd should also ignore hidden files by default because they are, after all... hidden and that's what is expected from a user. I'm not sure anything else should be ignored by default unless there's a convincing argument that someone who hasn't read the documentation would reasonably expect to ignore these files by default. I think you could argue something like a backup file ~* or .sw? but I'd default to filtering only the minimal as the default option. Remember, users do have grep (and sed, awk, etc) and they're probably already used to filtering because find will spew out plenty of files at you and that's not a bug.

So I think as long as the default options fit "what would a user that only has basic find knowledge expect" (because let's be real, most users don't know find very well either but there are the basic common patterns that are much more well known). Additional flags that apply different filters? I'm all for that but I'm also not the one putting in the time and effort so that's easy for me to say lol