Closed ErichDonGubler closed 5 years ago
as a workaround you can use fd | sort -V
@ErichDonGubler Thank you for your feedback.
I think this is better left to external tools, as @tmccombs suggests. Another way is to use xargs
to do the sorting via ls -1v
:
▶ fd -0 | xargs -0 ls -1v
file-1
file-2
file-3
file-11
file-12
file-22
Also, see #196 and #159.
@sharkdp: I don't really have anything to add after seeing the discussion you've linked. You definitely call the shots! I wonder if there IS a case where there's significant enough gain by using internal sorting (which would NOT be the default, of course -- I agree with the opinion you expressed there in #159). Let me see if I can find some numbers that form a convincing case -- if I can't find something in the next few days, I'll happily close this. :)
@ErichDonGubler Thank you for the feedback.
I'd definitely be interesting in hearing use cases for such a feature! However, I am still following the "80% of the use cases" philosophy with fd
, as mentioned in the README.
I'm going to close this for now. Feel free to comment here and I can reopen the ticket.
Windows user here to complain about order inconsistency between launches.
Let’s say I want to compute hashes like that fd -tf -d 1 -x rhash --sha256
Expected order | Launch 1 | Launch 2 | Launch 3 |
---|---|---|---|
AUTHORS | AUTHORS | AUTHORS | AUTHORS |
ccguess.1 | ccguess.1 | ccguess.1 | ccguess.1 |
ccguess.html | ccguess.html | ccguess.html | ccrypt.1 |
ccrypt.1 | ccrypt.1 | ccrypt.1 | ccguess.html |
ccrypt.html | ccrypt.html | ccrypt.html | ChangeLog |
ChangeLog | ChangeLog | ChangeLog | ccrypt.html |
COPYING | COPYING | COPYING | COPYING |
cygwin1.dll | cypfaq01.txt | cypfaq01.txt | cypfaq01.txt |
cypfaq01.txt | NEWS | NEWS | cygwin1.dll |
NEWS | ps-ccrypt.el | cygwin1.dll | NEWS |
ps-ccrypt.el | cygwin1.dll | ps-ccrypt.el | ps-ccrypt.el |
ps-ccrypt.elc | ps-ccrypt.elc | ps-ccrypt.elc | ps-ccrypt.elc |
README | README | README | README-WIN |
README-WIN | README-WIN | README-WIN | README |
@sergeevabc: Was there something in the documentation that gave you the impression that a certain order of output was guaranteed? AFAIK fd
doesn't make any.
@ErichDonGubler, some kind of processing order is usually expected from CLI file-related utils (archivers like 7zip and Zstandard, backup managers like Duplicacy and Restic, defrag managers like Contig, duplicate killers like Jdupes, hash calculators like Rhash, even media encoders like LAME and FLAC gave me that impression).
So, first, let me see if I can address your immediate problem by asking a question: can your environment be expected to have common POSIX tools likesort
and xargs
? If it can and I'm understand how you want to use rhash
, then you can do something like:
fd -tf -d 1 | sort -V | xargs -I {} rhash --sha256 {}
In regards to fd
itself, perhaps the best way to handle your complaint is making the lack of order guarantee explicit in documentation? How do you feel about that? EDIT: see @sharkdp's suggestion below. This is probably the solution to add to documentation.
Second, it's true that applications can (and often do) enforce a specific order of file walking results -- even if it is only defined by the filesystem implementation. However, not all applications or tools guarantee it, particularly those that traverse file trees asynchronously and without a cleanup nor sorting pass. fd
is one of those tools.
To illustrate my point, let's analyze where asynchronously operations happen in the relevant paths of fd
's source by stepping through manually:
main
enters walk::scan
.walk:: scan
that acts as the work queue for printing results to stdout
later, with the results sent by a later usage of a parallel directory walker constructed here. This introduces at least two places where async conditions (which are effectively non-deterministic) will affect order of results.walk::scan
enters walk::spawn_receiver
, where the thread receiving results to print is born. If we're executing the invocation with job execution you referenced above (fd <expr> -x <job_template>
),
the passed FdConfig
has a command and it's not a batch command, so a pool of threads are spun up, which run exec::job
.exec::job
, it calls exec::CommandTemplate::execute_command
, which calls exec::command::execute_command
.execute_command
finally executes the job command and locks a printing mutex, first printing the command's stdout
and then stderr
. This means that even if a command starts first, if it ends AFTER another command then the second command will still print first.I'll let @sharkdp correct me if I'm wrong here about the intent of the code, but my assumption is that it's optimized for speed: don't add another pass, keep work between file discovery and printing output as simple as possible.
@ErichDonGubler is correct. You can use --threads 1
/ -j 1
if you want to have a deterministic output order.
@sharkdp, indeed, -j 1
fixes the issue of output sorting.
Consider adding remark about sorting both to docs here and next to that switch (via -h
and --help
).
@ErichDonGubler, your bio says ‘dedicated to building software for other humans’. Being an average human with calloused hands, I’m looking for tools that first and foremost deliver the predictable output based on the previous experience. Human-friendly tool is expected to have name and version, licenсe and author’s contact data, manual with commands explanation and usage examples. But above all its tangible visual part should resemble behaviour of other tools from the same niche (until author is some kind of revolutionary who believes that customs are obsolete or ineffective). For example, ag
, grep
, pt
, ripgrep
, and sift
are made to search files for patterns, ripgrep
is the fastest among them and it delivers that speed without quirks: switches are mostly kept intact for a sake of consistency not to retrain users and output looks like what user rooted in (pioneering) Western digital culture expects to see (e.g. left-to-right, a-z). The other way round inevitably leads to lengthy justifications about ‘asynchronicity’ and other peculiarities under the hood, which might impress enthusiasts and the academic milieu, but would likely confuse and alienate our human.
@sergeevabc: I see the value in having a reproducible order with the tools we're discussing here, and I'm glad you are teaching me about it! You're the first human I've encountered that has A) expressed a preference for a reliable order and B) has actually taken time to write about it. I would imagine that many humans might also not care or prefer speed to that ordering (because they may not have the same previous experience as you!) -- so I don't consider your point generally applicable, but I do think it's a valuable perspective to keep in mind.
This is now supported (in a particular way) by the new -l
/--list-details
option, see #556.
Hmm.
Windows 7 x64, FD 9.0.0.
$ fd -g *.jpg -tf -j 1 -x xxhsum {}
\879b2d9894fda9fd .\\thumbs.jpg
\44c472c9d6f50bf8 .\\DSC_6953.jpg
\9e0e685cb71d658e .\\DSC_6947.jpg
\b21dfab7d945fc8c .\\DSC_6945.jpg
\e507ebc868c72df5 .\\DSC_6943.jpg
\d13e17e56d68c251 .\\DSC_6942.jpg
\fc5313fefea68b02 .\\DSC_6923.jpg
\9e87379e55f0c7d4 .\\DSC_6907.jpg
\23703ed86e11a3e6 .\\DSC_6906.jpg
\8e8ee2a826c7e045 .\\DSC_6905.jpg
\31939f6304f099b6 .\\DSC_6904.jpg
\323f7c57871e27e6 .\\DSC_6903.jpg
\d1dfab7f948a3dc1 .\\DSC_6902.jpg
\563c70cda89a1737 .\\DSC_6901.jpg
\8d1ab1076d4b4cd7 .\\DSC_6900.jpg
\6fbfac6f39669c1b .\\DSC_6899.jpg
\b7607f0b98a92bf4 .\\DSC_6898.jpg
\73ed628cb434a733 .\\DSC_6897.jpg
\8380dc1a51b5972a .\\DSC_6896.jpg
\24fa50966a7b913c .\\DSC_6895.jpg
\46cb55b63ff71972 .\\DSC_6894.jpg
However, the following output was expected
$ xxhsum *.jpg
46cb55b63ff71972 DSC_6894.jpg
24fa50966a7b913c DSC_6895.jpg
8380dc1a51b5972a DSC_6896.jpg
73ed628cb434a733 DSC_6897.jpg
b7607f0b98a92bf4 DSC_6898.jpg
6fbfac6f39669c1b DSC_6899.jpg
8d1ab1076d4b4cd7 DSC_6900.jpg
563c70cda89a1737 DSC_6901.jpg
d1dfab7f948a3dc1 DSC_6902.jpg
323f7c57871e27e6 DSC_6903.jpg
31939f6304f099b6 DSC_6904.jpg
8e8ee2a826c7e045 DSC_6905.jpg
23703ed86e11a3e6 DSC_6906.jpg
9e87379e55f0c7d4 DSC_6907.jpg
fc5313fefea68b02 DSC_6923.jpg
d13e17e56d68c251 DSC_6942.jpg
e507ebc868c72df5 DSC_6943.jpg
b21dfab7d945fc8c DSC_6945.jpg
9e0e685cb71d658e DSC_6947.jpg
44c472c9d6f50bf8 DSC_6953.jpg
879b2d9894fda9fd thumbs.jpg
As you can see, FD a) got it in reverse and b) added some odd slashes.
@sergeevabc: If you consider that a bug, filing a separate issue is likely to be more fruitful than posting in a (tangentially related) resolved feature request that was originally filed ~5.5 years ago. 😉
Huh, I can reproduce this as well. It appears, at least for relatively small numbers of results, that if you use -j 1
with --exec
, that fd runs the commands in reverse order from what the file system gives you. I don't know why though, maybe some weird behavior with crossbeam_channel, or possibly the ignore crate?
As for the slashes... that is very strange. fd doesn't do any transformation on the command output, so I have no idea what is causing that.
After some more investigation it does appear that this is the result of ignore
giving us results in the opposite order from what the filesystem does.
I don't know why that is, probably the implementation uses a stack, and pulls items off of the stack. If we switched to using Walk
instead of WalkParallel
in the -j1
case then it might use a more expected order, at the cost of additional code complexity.
Note that while using -j1
will give you a determenistic ordering, it won't necessarily give you a sorted order, even if we used exec in the same order we got them, because depending on the filesystem you could get the results in a variety of different orders (for example, in creation order, based on hash values of the file names, alphabetically, etc.).
Another option could be to refactor our optimistic sorting if we get results quickly for --exec
as well. I'm not sure how difficult to that would be to do.
lt would be really, really handy for scripting to have a flag to arrange output using natural ordering.
Given the following
tree
:The output of
ls -1
/fd
is currently:The proposed output for natural order (
ls -1v
, proposed to be something likefd -v
) would be:If you need a dependency for this,
rust-natord
is small and seems like it could fit the bill.