sharkdp / fd

A simple, fast and user-friendly alternative to 'find'
Apache License 2.0
33.48k stars 798 forks source link

Chained usage `fd ... -X fd ... [-X]` -- examples welcome in README? deeper support for chaining? #1450

Closed mcint closed 9 months ago

mcint commented 9 months ago

Using fd version: fd 8.7.0

I find myself using fd in a chained manner.

Chained-style

For example, for quickly viewing python packages, today I find myself searching for fd brotab -td ~/.local/lib/python3.10/ -X fd api -e py, since api is a common file and package name-component, and I'm just looking for the one today. I find myself wanting to run commands on the result, or chain a third (or more) times. I'm looking to document that pattern for other users of fd.

mcint commented 9 months ago

I've rewritten this chained query in an extendable way (thought it suffers in keystroke cost): <<< ~/.local/lib/python3.10/ xargs fd brotab -td | xargs fd api -e py.

Would a word-match flag PR be welcome? Something like grep's -w/--word-match. I understand that ^[pattern]$ can match a full component, but I would like a syntax that can be added on to a query. For my small example (which doesn't strongly justify this request):

tavianator commented 9 months ago

For example, for quickly viewing python packages, today I find myself searching for fd brotab -td ~/.local/lib/python3.10/ -X fd api -e py, since api is a common file and package name-component, and I'm just looking for the one today.

fd ... -X fd ... is not something that should be recommended. The main problem is it can drastically explode the result set:

$ fd foo
foo
foo/foo
foo/foo/foo
$ fd foo -X echo fd bar # To see what would be executed
fd bar ./foo ./foo/foo ./foo/foo/foo
$ fd foo -X fd bar # What would actually happen
./foo/bar
./foo/foo/bar
./foo/foo/foo/bar
./foo/foo/bar
./foo/foo/foo/bar
./foo/foo/foo/bar

You could pass --prune to the first fd to avoid this, but still I don't think we should recommend this pattern at all.

For your case, it's probably best to do all the filtering in the same fd command:

$ fd -td --full-path 'brotab.*api.*\.py$' ~/.local/lib/python3.10/

Would a word-match flag PR be welcome? Something like grep's -w/--word-match. I understand that ^[pattern]$ can match a full component, but I would like a syntax that can be added on to a query.

You can write word boundaries in the regex like this:

$ fd '\bpattern\b'

I don't think we'd add a flag to do this for you, fd already has too many flags :)

mcint commented 9 months ago

Hm, thank you, interesting suggestions.

I will consider --prune in my workflows, might try -P for that locally, and PR. Thank you!

It looks like, in practice, I can use -g/--glob, #692 (in place of my -w suggestion, https://github.com/sharkdp/fd/issues/1450#issuecomment-1852782497).

Sounds like no objections to submitting other use examples for the readme or docs, might PR later.


Extraneous thinking aloud, about chaining queries

I've chewed on variations where I can keep appending [pattern] or [depth] [pattern] for a while.

To build the motivation a bit more, I query things like this:

Compressed to: fd-chain -d3 [pkg] / -- -d3 [lib] -- -d3 -e ini .

Here are some real snippets of recent history, or for tasks I perform commonly:

fd -d4 ^php / -td | grep -ve -
fd -d4 ^php / -td | grep -ve - | xargs fd ini
fd -d4 ^php / -td | grep -ve - | xargs fd fpm
sudo apt install fzf
fd completion / -d4
fd completion / -d4 -X fd fzf -d4
fd completion / -d4 -td -X fd fzf -d4
fd fzf / -d4 -td -X fd completion -d4
. /usr/share/doc/fzf/examples/completion.bash
less /usr/share/doc/fzf/examples/completion.bash
. /usr/share/doc/fzf/examples/key-bindings.bash

Although, these examples each only use 2 steps.

Nit about full-path matching

fd ... -X fd ... is not something that should be recommended. The main problem is it can drastically explode the result set:

Thank you for a considered response, and I agree that blindly performing nested queries might blow up traversals & time required and results size. However, I must insist, full-path matching seems ill-advised, file systems have a really high branching factor, and searching them quickly and effortlessly (few keystrokes, forgiving argument order, concatentative/append-only use supported) is what makes fd such a delight to use. Full path matching makes this searching much more expensive. For argument's sake, model number of files as exponential in depth, 10^[D] files are present in D levels of fs tree. I've used fd on systems where -d4 returns in acceptable time, and -d5 takes a full minute or more. Chaining queries is quite useful, to limit the haystack size.

From painful experience, I can report that searching chained from partial matches helps a lot on low-resource systems.

Nested matching names are not entirely contrived, but requerying with a more limited depth, or now glob matching are what I'll try.

Fiddling with the shell cursor to modify queries is also frustrating in practice.

Thank you for your work maintaining -- answering random usage questions, and considering design space around the tool!

tavianator commented 9 months ago

Nit about full-path matching

fd ... -X fd ... is not something that should be recommended. The main problem is it can drastically explode the result set:

Thank you for a considered response, and I agree that blindly performing nested queries might blow up traversals & time required and results size. However, I must insist, full-path matching seems ill-advised, file systems have a really high branching factor, and searching them quickly and effortlessly (few keystrokes, forgiving argument order, concatentative/append-only use supported) is what makes fd such a delight to use.

One thing that may help concatenative use is --search-path and --and, e.g.

$ fd --full-path --search-path ~/.local/lib/python3.10/ /brotab/ --and api -e py

Full path matching makes this searching much more expensive.

Does it? I see how it could, but I expect I/O and syscall overhead to dominate pattern matching. Let's check:

tavianator@tachyon $ hyperfine -w2 "fd -u brotab ~" "fd -u --full-path brotab ~"
Benchmark 1: fd -u brotab ~
  Time (mean ± σ):      1.151 s ±  0.014 s    [User: 18.505 s, System: 33.398 s]
  Range (min … max):    1.134 s …  1.180 s    10 runs

Benchmark 2: fd -u --full-path brotab ~
  Time (mean ± σ):      1.151 s ±  0.008 s    [User: 20.426 s, System: 32.466 s]
  Range (min … max):    1.142 s …  1.164 s    10 runs

Summary
  fd -u --full-path brotab ~ ran
    1.00 ± 0.01 times faster than fd -u brotab ~

And here's a more representative benchmark for your use case. I changed it up because I don't have any copies of brotab lying around.

tavianator@tachyon $ hyperfine "fd -u --search-path ~ --full-path /requests/ --and api -e py" "fd -u -td --prune --search-path ~ requests -X fd -u api -e py"
Benchmark 1: fd -u --search-path ~ --full-path /requests/ --and api -e py
  Time (mean ± σ):      1.126 s ±  0.014 s    [User: 14.427 s, System: 37.160 s]
  Range (min … max):    1.110 s …  1.149 s    10 runs

Benchmark 2: fd -u -td --prune --search-path ~ requests -X fd -u api -e py
  Time (mean ± σ):      1.156 s ±  0.012 s    [User: 16.962 s, System: 35.575 s]
  Range (min … max):    1.139 s …  1.181 s    10 runs

Summary
  fd -u --search-path ~ --full-path /requests/ --and api -e py ran
    1.03 ± 0.02 times faster than fd -u -td --prune --search-path ~ requests -X fd -u api -e py

Both queries return the same set of 110 files.

For argument's sake, model number of files as exponential in depth, 10^[D] files are present in D levels of fs tree. I've used fd on systems where -d4 returns in acceptable time, and -d5 takes a full minute or more. Chaining queries is quite useful, to limit the haystack size.

First off, you may be interested in #28 and possibly https://github.com/tavianator/bfs :)

Secondly, the total work is roughly the same for both approaches anyway. With one fd command, it has to explore the whole tree. With --prune ... -X fd ..., the parent fd explores the whole tree except under the brotab directories, and the child fd(s) explore just the brotab subtrees. In both cases, each path is examined by exactly one fd process. You just have more total processes with -X fd.

(Without --prune, -X fd does a lot more total work, because the parent fd is also searching the brotab trees along with the children.)

From painful experience, I can report that searching chained from partial matches helps a lot on low-resource systems.

I'm kind of surprised that -X fd chaining would ever be beneficial without --prune. I believe you, I'm just struggling to think of why that would happen.

Fiddling with the shell cursor to modify queries is also frustrating in practice.

True. One handy thing is most shells support Emacs-style keybindings for line editing, e.g. C-a (Ctrl+A) for beginning-of-line, C-e for end-of-line, M-b (Alt+B) to jump back a word, M-f to jump forward a word, etc. Often Ctrl+/ will work too. You can use vi-style keybindings instead with set -o vi too.

Thank you for your work maintaining -- answering random usage questions, and considering design space around the tool!

You're welcome! :)