Open avielsh opened 4 years ago
Currently, to get the file information I'm running:
fd --changed-within=10days -x ls -dl --time-style=long-iso "{}"
Which spawnsls
for every result.
That's what -X
/--exec-batch
is for, which would spawn just a single ls
process. This has the added advantage that the output columns are aligned (minor comment: the "{}"
argument is superfluous and can be omitted). Would this work for you?
fd --changed-within=10days -X ls -dl --time-style=long-iso
This is actually much slower than running:
find . -mtime 10 -printf "%TY-%Tm-%Td %TH:%TM %p\n"
The equivalent command would use -mtime -10
, right?
Test dataset is around 20,000 files. fd is running for ~21 seconds, find for ~10.
How do you perform your benchmarks? Do you account for disk caching effects?
That's what -X/--exec-batch is for, which would spawn just a single ls process.
My result set for that is too large (~20000) files. I'm getting:
[fd error]: Problem while executing command: Argument list too long (os error 7)
Maybe add a batch count argument to limit arguments per batch ?🤔
I actually tried to pipe to xargs
with -P4 -n1500
for batch processing while maintaining the argument list for ls
short enough, but the parallel execution in xargs causes all sorts of string issues in the output.
The equivalent command would use -mtime -10, right?
Yes my mistake :)
How do you perform your benchmarks? Do you account for disk caching effects?
Hyperfine is cool. I was using time
.
Actually, my example was a simplified version of what I actually use, which is listing the whole file system. ( for instant searching like locate
)
Here's my hyperfine result (gfind and gls are GNU versions of the commands) (~150k files)
hyperfine --warmup 3 '\fd -E ''/Volumes/*'' -E ''/dev/*'' -E ''.git'' -IH --color=never . . -x gls -dl --time-style=long-iso' 'gfind . -type d \( -path /dev -o -path /Volumes \) -prune -o -printf "%TY-%Tm-%Td %TH:%TM %p\n"'
Sorry for the superfluous arguments (-E|-prune), I was copying the command from another search.
I actually tried to run this command on the whole file system (~2 million files) but after 55 minutes, I gave up.
My result set for that is too large (~20000) files. I'm getting:
[fd error]: Problem while executing command: Argument list too long (os error 7)
Maybe add a batch count argument to limit arguments per batch ?thinking
Unfortunately, this is a known issue which should be fixed, see #410.
I actually tried to pipe to
xargs
with-P4 -n1500
for batch processing while maintaining the argument list forls
short enough, but the parallel execution in xargs causes all sorts of string issues in the output.
Are you sure that this is caused by the parallel execution? Or could it be related to file names with spaces? In the latter case, please try to use the -0
(zero) option for both fd
and xargs
:
fd --changed-within 10day -0 | xargs -P4 -n1500 -0 ls -dl --time-style=long-iso
This seems to work just fine for me (even with smaller -n
arguments).
Are you sure that this is caused by the parallel execution? Or could it be related to file names with spaces? In the latter case, please try to use the
-0
(zero) option for bothfd
andxargs
:
Yes, this is a known issue. Here. Here also.
I am using -0 on fd and xargs
This seems to work just fine for me (even with smaller -n arguments).
-n isn't the issue, it's the combination with -P that causes the output to get garbled. I'm guessing it has something to do with the write buffer, I haven't completely understood it .
Try to output a large result set into a file and grep out the right pattern (grep -v "^.\{10,11\} "
), you'll see the issue.
This is not related to fd, I get the same effect when cating the file list into xargs -P0.
Anyway when using fd with xargs -n1500 , find is still getting better results:
hyperfine -i --warmup 3 '\fd -0 -IH . / | gxargs -0 -n1500 gls -dl --time-style=long-iso' 'gfind / -printf "%TY-%Tm-%Td %TH:%TM %p\n"'
When I run xargs with -P , fd/xargs is about half the time of find but about 1000 files out of 2million are garbled so I cannot use it. Note: I don't know why hyperfine is reporting only 1.15 difference , I ran it a few times with time and it was about half the time than find. Maybe something was running in the background...
hyperfine -i --warmup 3 '\fd -0 -IH . / | gxargs -P4 -0 -n1500 gls -dl --time-style=long-iso' 'gfind / -printf "%TY-%Tm-%Td %TH:%TM %p\n"'
I would think that adding -printf like option would overcome all this piping because my guess is that fd is already stating everyfile and piping out to another command would never be as fast as internal function (for data it already has).
I would also find printf
functionality quite useful and pertaining to fd's direct domain.
A use-case: Sometimes we need to sanitize filenames to port them from one platform to another. We find extreme cases, such as (not kidding),
''$'\n\n'' * courierblog'$'\n\n''Our history'$'\n''07-01-2009'$'\n''"The cat?" by Charles Perrault.webloc
mv
is picky and we must deal with cases where -0
or xargs -n2
are not enough, by properly wrapping filenames.
To my knowledge, there is currently no (convenient) way to print the list of files sorted by modification date as shown here https://stackoverflow.com/a/1405664: find . -type f -printf "%-.22T+ %M %n %-8u %-8g %8s %Tx %.8TX %p\n" | sort | cut -f 2- -d ' '
.
One could default to iso8601 utc (adding Z for zero) with less formatting gibberish needed.
APPENDUM: fd -t x --changed-before now
and DATE=$(date) && fd -t x --changed-before "$DATE"
also does not work, so there is no convenient way to "just print stuff sorted by date", which is a bad user experience.
[fd error]: 'Tue 21 Mar 2023 11:06:00 AM CET' is not a valid date or duration. See 'fd --help'.
Throwing another vote into the ring for -printf
equivalent. It's a very general tool that can be used for so much more than -X ls
can.
I have made a PR that might help with some uses of this: https://github.com/sharkdp/fd/pull/1043 but I haven't gotten around to benchmarking it yet.
I have made a PR that might help with some uses of this: #1043 but I haven't gotten around to benchmarking it yet.
Unfortunately I don't know Rust so can't understand any of your changes. A little example of the core functionality you're adding would make a nice addition to the PR description. That way people like me can follow.
Regardless, if your changes bring about some -printf
-iness, I'm all for it and hope you can get it merged soon!
Windows 7 x64, fd 9.0.0.
I want to get a list of folders with .startup
file in the alphabetical order.
Bypassing fd's birth traumas of having to specify the -H
key to find .filenames and still not having --sort key to sort the output, I tried -printf "%h\n"
to get paths without filenames, but this option is still not implemented as well. Aggrrhh!
Workaround using coreutils or busybox:
$ fd -H -g ".startup" -X coreutils dirname {} | coreutils sort
.\01 Portable\Autohotkey
.\01 Portable\DNSCrypt
.\01 Portable\Keepass
Hi Please add a -printf like feature to fd. Currently, to get the file information I'm running:
fd --changed-within=10days -x ls -dl --time-style=long-iso "{}"
Which spawnsls
for every result.This is actually much slower than running:
find . -mtime 10 -printf "%TY-%Tm-%Td %TH:%TM %p\n"
(For example)Test dataset is around 20,000 files. fd is running for ~21 seconds, find for ~10. Of course when more results are found, fd method would become slower. Thanks !