martymac / fpart

Sort files and pack them into partitions
https://www.fpart.org/
BSD 2-Clause "Simplified" License
230 stars 39 forks source link

Avoid calling stat() on every file if you're not using size limit #9

Closed candlerb closed 5 years ago

candlerb commented 5 years ago

I am trying to sync files from a slow Windows NFS server. rsync was slow, but fpsync is even slower - even just collecting the list of files to be copied.

Using strace, I can see that as well as waiting for getdents, every file is being stat'd. To demonstrate, here is a simple reproducer:

strace -f fpart -f 100 -L /etc 2>&1 | grep stat

But if you are using fpart -f without -s, I don't think stat() needs to be called at all.

There is a secondary issue: if you run fpsync with -f but without -s, fpsync still passes -s 4294967296 to fpart. But sinc fpsync is a shell script, that's easily hacked out.

martymac commented 5 years ago

Hi Brian,

Thanks a lot for your report.

Regarding the possibility to remove size (or file number) limits, I've just pushed a patch that enables passing 0 as -f or -s argument to set it unlimited. I would like the defaults to remain the same as I want fpsync to be simple to use, and not force users to pass -f or -s arguments.

The first part of your request is more tricky because fpart(1) heavily relies on fts(3) which itself makes use of fstatat(2). I'll investigate to see if something can be done here.

Regards,

Ganael.

candlerb commented 5 years ago

The first part of your request is more tricky because fpart(1) heavily relies on fts(3) which itself makes use of fstatat(2). I'll investigate to see if something can be done here.

Thank you. Checking readdir(3), it seems the d_type member is only supported on certain types of filesystems - so a recursive trawl might be forced to fall back to stat anyway, just to tell the difference between a file and a directory.

martymac commented 5 years ago

Yes, fts(3)'s FTS_NOSTAT option uses tricks to speedup crawling, but they are only enabled for 'UFS-like' filesystems (ufs, zfs, nfs, ext2fs; see ufslike_filesystems[] in fts.c). Also, FTS_NOSTAT can only be used while using FTS_PHYSICAL, but fpart sometimes needs FTS_LOGICAL to follow symlinks. Last but not least, fpart sometimes needs to sort directories first (for options -D and -E) and needs FTSENT's fts_statp information for that purpose.

So there are too many dependencies on that piece of information, and not only in order to compute partition size, so I don't think I can improve something here.

Anyway, thanks for your feedback :)

candlerb commented 5 years ago

I guess it would be possible to set FTS_NOSTAT iff a bunch of preconditions were satisfied. But thanks for looking into this anyway.