Closed fiveNinePlusR closed 6 years ago
xargs is not an awesome way to do it because of limitations with spaces in filenames
fd
supports -0
to use with xargs
to support spaces in filenames:
$ fd foo -0 | xargs -0 ls
'foo bar.txt'
The main point of -exec
is not the escaping, but shell's limitations on number of arguments. xargs
can not execute command if there's too many of those. -exec
, on the other hand, is not limited by shell as it executes commands sequentially one at a time
Up
It would be great if someone could come up with a specific plan on how this would work. Do we want to clone finds -exec
behavior? finds -execdir
behavior? Is there anything that we want to do differently? Anything we could improve?
I reach for -exec
the majority of the time and would think that is most popular. I know {}
is useful for explicitly saying the command uses the match as an argument so it would be great to support that. Perhaps instead of passing a semicolon to terminate the command, fd
could just use all arguments after the -exec
flag?
-exec, on the other hand, is not limited by shell as it executes commands sequentially one at a time
That's actually one thing that we have to think about when writing up a proposal on how this should be implemented.
Since fd
searches files in parallel, we could (in principle) execute the necessary child processes in parallel. This would potentially lead to a significant speed-up compared to a sequential -exec
but could also complicate things a lot. That would be somewhat similar to fd -0 .. | parallel -0 ...
.
It was brought to my attention that, actually, xargs -n 100 -P 4
will execute commands in batches of 100 arguments using up to 4 parallel processes. So, according to unix-way there's no problem and case can be closed. Sorry for my ignorance.
It might be an opportunity for optimization, but, probably, it would be enough to document why -exec
is not needed
@indeyets Thank you, I also didn't know that.
I'm also starting to think that it might be better to not include -exec
in fd. And I agree, it would be in line with the "unix-way" and also the goals of fd (to be a simple and easily understandable tool).
I don't want to make any decision yet, though. Please keep the discussion going :+1:
perhaps it would be sufficient to add in -exec with a small help description of how to use it in conjunction with xargs? I don't know enough about every single idiosyncrasies of xargs and find to know if there are other issues.
I could easily implement this within a day, and possibly parallel command executions via a job pool, too. I've a lot of experience with this kind of software in Rust. I would recommend following GNU Parallel's syntax for command generation though. It's much more flexible than find
's limited command generation capabilities. Syntax is a good fit with file-based command generation, too.
{} - simple placeholder token {.} - remove the extension {/} - basename {/.} - basename without extension {//} - parent path
And bonus token:
{^abc...} - remove a custom suffix
@mmstick That sounds good :smile:
Given that you wrote a parallel
-clone in Rust, why do you think it would be beneficial to add --exec
to fd (instead of just piping to parallel
)?
Before we implement a feature like this, I would like to see at least a short outline on how this would feature would work exactly (which command-line options would be added? what would be the syntax of the --exec
argument? which new dependencies would we have to pull in? how would this interfere with other features of fd?).
I would recommend following GNU Parallel's syntax for command generation though. It's much more flexible than find's limited command generation capabilities.
I should learn more about parallel
...
Interesting syntax for that... If you did add this to the utility, would it also be prudent to add in something like the following?
{basename}
{no_extension}
{basename_no_extension}
or you could do this to make parsing easier:
{{basename}}
{{no_extension}}
etc.
etc. to make it more explicit. the other tokens would still be matched on as well.
The idea is analogous to -v
and --verbose
one's short and cryptic and the other is long and explicit to the reader.
It's just a rather simple feature that can easily exist within it's own standalone module. The main benefit would just be cutting out the middle man. I would think that just the --exec
flag would be good enough, and syntax would look similar to GNU Parallel (minus the manual supplying of arguments and permutating inputs).
fd *.flac -type f -exec 'ffmpeg -i {} -c:a libopus {.}.opus'
fd *.flac -type f -exec ffmpeg -i {} -c:a libopus {.}.opus
You could simply have it to where all arguments following -exec
are treated as the command to use, and if no placeholder tokens are used, then simply add arguments to the end of the command when generating them.
When parsing the arguments, and seeing a command, you'd parse the command into a vector of string references & tokens. Something like an Option<Vec<Token<'a>>>
field, which if set to Some
, will signify the program to use the contents of that field to generate and execute commands.
The default could be to just execute commands serially. A job pool can easily work if we string together a Arc<Mutex<VecDeque<T>>>
to share across threads. If we want to capture the results and have them printed serially, we could just create a Arc<Mutex<IntMap<usize, File>>>
to store the FDs of the executed commands for the main thread to grab from and print in a serial fashion.
If there are any against it, I could also implement it as an optional feature, gated behind conditional compilation.
@mmstick Sounds good, thank you very much for writing this up!
I'm certainly not against this feature, but I'm curious what the advantage is over using parallel
/xargs
?
Having to bring out parallel
/ xargs
results in having to execute a command to execute commands. So instead of fd -> parallel -> shell procs -> commands
, you'd just have fd -> shell procs -> commands
. If you wanted to go a step further, you could also directly embed the Ion shell as a library, and then you'd just have fd -> commands
. Put simply, you'd win benchmarks. No real advantages other than that.
Fair enough, in this case, let's go for it :smile:. The feature has been asked for by a lot of people. Your help/contribution would be very much appreciated.
fd .flac -type f -exec 'ffmpeg -i {} -c:a libopus {.}.opus' fd .flac -type f -exec ffmpeg -i {} -c:a libopus {.}.opus
I think I would prefer to only support the first variant of this, i.e. just a normal command line option --exec
/-e
that takes a single argument. This option could appear anywhere on the command line -
before or after the pattern, just like any other option and flag.
It seems to me that the other variant would be a possible source of confusion/errors.
If there are any against it, I could also implement it as an optional feature, gated behind conditional compilation.
We could still do this afterwards if it turns out we want this to be configurable. Right now, I don't see any need for this - but thanks for the suggestion.
Another thing that just comes to my mind is platform-independence. Are there any complications that we could run into?
Another thing that just comes to my mind is platform-independence. Are there any complications that we could run into?
This can be implemented in a platform-independent manner.
I'll have a PR submitted later today, once I've documented and refactored what I've written so far.
I am sure this utility could improve upon find's exec command to make a common thing easily accessible and powerful. xargs is not an awesome way to do it because of limitations with spaces in filenames.
interactive exec would be nice to have too. do a
fd --interactive-exec some_query
and then you are prompted for a command that doesn't need any special delimiters but would pass in things like %file %path %filename etc to the command that get parsed and replaced with a properly escaped and quoted string.Just spitballing ideas here as this is not a fully baked idea.