martymac / fpart

Sort files and pack them into partitions
https://www.fpart.org/
BSD 2-Clause "Simplified" License
230 stars 39 forks source link

fpart: incomplete partitions on sshfs #35

Closed vali-um closed 2 years ago

vali-um commented 2 years ago

Hello,

We are currently trying to use fpart on a mounted sshfs (part of libfuse) filesystem. The filesystem contains about 11TB of data within over 10.000.000 directories across 15.000.000 files. Running fpart produces parts until it hits a little over 1.000.000 files. after that it seems that all the directories look empty or unreadable to fpart as it just writes out the current levels directories, then one level up and so on. I further discovered that some directories in the middle of the run have their subdirectories listed as empty which should have files in them. as far as i can tell it is inconsistent for which directories this happens. The commandline used is: fpart -f 1000000 -o debug_2.part -zz -vv <MOUNTPOINT> This was tested with version 1.4.0 packaged in debian bullseye and version 1.4.1 built from the github repo. Building with the option --enable-debug does not reveal anything more to me, the output just shows valid_file(): iterating over the same files and directories (which later appear empty).

Running fpart on the original filesystem on the other server (ext4) produces complete partitions as far as i can tell.

I'd be happy to do further debug work if you tell me what you need.

Cheers, Valentin

martymac commented 2 years ago

Hello,

I suspect sshfs being the culprit here (the underlying FS layer is stressed by lots of stat(2) syscalls). Can you still -manually- list the contents of those 'empty' directories afterwards ? (once fpart failed to list them) Have you had a look to your kernel/fuse log messages ?

vali-um commented 2 years ago

Yes, i can manually list the contents afterwards on this after fpart failed to list them. i took a look at the kernel logs but they are completely empty. nothing is reported while the issues occur. Also the sshfs debug output basically repeats messages like this all the time, nothing stands out to me there

  [4581339]         STATUS       32bytes (0ms)
  [4581340]         STATUS       32bytes (0ms)
  [4581341]         STATUS       28bytes (0ms)
  [4581342]         HANDLE       17bytes (0ms)
[4581343] READDIR
[4581344] READDIR
  [4581343]           NAME      317bytes (0ms)
  [4581344]         STATUS       32bytes (0ms)
[4581345] READDIR
[4581346] READDIR
[4581347] CLOSE
[4581348] OPENDIR
  [4581345]         STATUS       32bytes (0ms)
  [4581346]         STATUS       32bytes (0ms)
  [4581347]         STATUS       28bytes (0ms)
  [4581348]         HANDLE       17bytes (0ms)
[4581349] READDIR
[4581350] READDIR
  [4581349]           NAME      457bytes (0ms)
  [4581350]         STATUS       32bytes (0ms)

Thanks for your help, Valentin

martymac commented 2 years ago

Hello,

Can you crawl your SSHFS filesystem with something like :

$ find /path/to/sshfs -ls

to force stat()'ing every single file/dir ? Is crawling failing the same way ?

vali-um commented 2 years ago

no, crawling this way with find works as expected.

We do have sftp logs from the server side of our sshfs which don't show any error, it seems that at some point the directory just aren't stat()'ed anymore.

martymac commented 2 years ago

Hello Valentin,

First of all, to clarify : you are using option -zz which means "pack un-readable directories as empty" (related to FTS_DNR, see fts(3)), be aware that by default, fpart packs files only. You should not see any directory in output partitions here (except for unreadable directories). Is that what you really expect ?

Anyway, this is weird because fpart should print an error message for unreadable directories.

To further debug the problem, you should try to isolate a sub-directory (ideally, a small one, deep in the file tree) that is incorrectly reported as empty and start fpart over its parent directory. That way, crawling/testing will be easier as you will work on a much smaller file tree. You will also see if the problem is fully/regularly reproducible or not.

If you can reproduce the problem that way, try to see if/why fpart cannot access directory contents : try to truss/strace the process to see what's happening. I really suspect a sshfs bug here and truss/strace should tell you more (have a look at fstat/fstatat() calls to see what they return).

Best regards, Ganael.

vali-um commented 2 years ago

Hello Ganael,

First of all, to clarify : you are using option -zz which means "pack un-readable directories as empty" (related to FTS_DNR, see fts(3)), be aware that by default, fpart packs files only. You should not see any directory in output partitions here (except for unreadable directories). Is that what you really expect ?

It is (kind of): I want to include empty directories for backup purposes, packing un-readable directories as empty is more of a debug step as there shouldn't be any but it helps to track where the issue occurs.

Anyway, this is weird because fpart should print an error message for unreadable directories.

I have never seen such an error message, what format would this message have and are there circumstances which prevent those messages?

To further debug the problem, you should try to isolate a sub-directory (ideally, a small one, deep in the file tree) that is incorrectly reported as empty and start fpart over its parent directory. That way, crawling/testing will be easier as you will work on a much smaller file tree. You will also see if the problem is fully/regularly reproducible or not.

The issue is reproducible on the full file tree but it does not occur for the same directory at every run and choosing a sub directory deep in the tree does not necessarily produce the error.

I straced the process and with my limited understanding of the issue it could either be that sshfs handles the fstat call incorrectly or fpart handles the response incorrectly.

the log (with redacted directory names but keeping the structure) for opening the last directory which was listed correctly:

write(2</root/fpart_debug/fs_root_strace.fpart.log>, "/mountpoint/???"..., 94) = 94
brk(0x5633e2e2e000)                     = 0x5633e2e2e000
openat(AT_FDCWD, "..", O_RDONLY)        = 4</mountpoint/XXX[dir_tree]/BBB[dir_before]>
fstat(4</mountpoint/XXX[dir_tree]/BBB[dir_before]>, {st_mode=S_IFDIR|S_ISGID
|0775, st_size=4096, ...}) = 0
fchdir(4</mountpoint/XXX[dir_tree]/BBB[dir_before]>) = 0
close(4</mountpoint/XXX[dir_tree]/BBB[dir_before]>) = 0
openat(AT_FDCWD, "..", O_RDONLY)        = 4</mountpoint/XXX[dir_tree]>
fstat(4</mountpoint/XXX[dir_tree]>, {st_mode=S_IFDIR|S_ISGID|0775,
st_size=4096, ...}) = 0
fchdir(4</mountpoint/XXX[dir_tree]>) = 0
close(4</mountpoint/XXX[dir_tree]>) = 0
openat(AT_FDCWD, "YYY[dir_ok]", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4</mountpoint/XXX[dir_tree]/YYY[dir_ok]>
fstat(4</mountpoint/XXX[dir_tree]/YYY[dir_ok]>, {st_mode= 
S_IFDIR|S_ISGID|0775, st_size=1204224, ...}) = 0
fstat(4</mountpoint/XXX[dir_tree]/YYY[dir_ok]>, {st_mode=S_IFDIR|S_ISGID|0775, st_size=1204224, ...}) = 0 
fchdir(4</mountpoint/XXX[dir_tree]/YYY[dir_ok]>) = 0
getdents64(4</mountpoint/XXX[dir_tree]/YYY[dir_ok]>, 0x5633e2e0dc10 /* 23 entries */, 32768) = 960

and for the first failed directory:

write(2</root/fpart_debug/fs_root_strace.fpart.log>, "/mountpoint/???"..., 101) = 101
brk(0x5633e3289000)                     = 0x5633e3289000
openat(AT_FDCWD, "..", O_RDONLY)        = 4</mountpoint/XXX[dir_tree]>
fstat(4</mountpoint/XXX[dir_tree]>, {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
fchdir(4</mountpoint/XXX[dir_tree]>) = 0
close(4</mountpoint/XXX[dir_tree]>) = 0
openat(AT_FDCWD, "ZZZ[dir_failed]", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4</mountpoint/XXX[dir_tree]/ZZZ[dir_failed]>
fstat(4</mountpoint/XXX[dir_tree]/ZZZ[dir_failed]>, {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
fstat(4</mountpoint/XXX[dir_tree]/ZZZ[dir_failed]>, {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
close(4</mountpoint/XXX[dir_tree]/ZZZ[dir_failed]>) = 0
write(2</root/fpart_debug/fs_root_strace.fpart.log>, "/mountpoint/???"..., 80) = 80

fpart was run with the -vv option and all fpart output piped to /root/fpart_debug/fs_root_strace.fpart.log

Thank you very much for your help, Valentin

martymac commented 2 years ago

Hello Valentin,

I have never seen such an error message, what format would this message have and are there circumstances which prevent those messages?

Nothing can normally prevent those messages from appearing. They are of the form:

: See: https://github.com/martymac/fpart/blob/master/src/file_entry.c#L657 and: https://github.com/martymac/fpart/blob/master/src/utils.c#L169 > I straced the process [...] The trace above clearly shows that fstat() on that directory does not return any error and reports a quite small directory (st_size=4096), probably an empty one. That suggests a silent failure of an underlying layer (SSHFS or Fuse itself). Could you try to switch to embedded fts code to see how it behaves ? (use configure's --enable-embfts to enable it and then rebuild fpart).
vali-um commented 2 years ago

Hi Ganael,

I have, for sure, never seen such a message before...

Building with --enable-embfts and running fpart on the sshfs filesystem doesn't fail silent as before but shows errors in the form of: <file_path>: No such file or directory for the subdirectories missing from the output list.

fpart output looks like this:

[....]
/mountpoint/XXX/development setup/subdir1: No such file or directory
/mountpoint/XXX/development setup/subdir2: No such file or directory
/mountpoint/XXX/development setup/subdir3: No such file or directory
/mountpoint/XXX/development setup/subdir4: No such file or directory
/mountpoint/XXX/development setup/subdir5: No such file or directory
/mountpoint/XXX/development setup/subdir6: No such file or directory
/mountpoint/XXX/development setup/subdir7: No such file or directory
/mountpoint/XXX/development setup/subdir8: No such file or directory
/mountpoint/XXX/development setup/subdir9: No such file or directory
/mountpoint/XXX/development setup/subdir10: No such file or directory
/mountpoint/XXX/development setup/subdir11: No such file or directory
/mountpoint/XXX/development setup/subdir12: No such file or directory
/mountpoint/XXX/development setup/subdir13: No such file or directory
/mountpoint/XXX/development setup/subdir14: No such file or directory
/mountpoint/XXX/development setup/subdir15: No such file or directory
/mountpoint/XXX/development setup: No such file or directory
[...]

while the corresponding strace output shows:

openat(AT_FDCWD, "development setup", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4</mountpoint/XXX/development setup>
fstat(4</mountpoint/XXX/development setup>, {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
fstat(4</mountpoint/XXX/development setup>, {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
getdents64(4</mountpoint/XXX/development setup>, 0x556fe59df6e0 /* 17 entries */, 32768) = 608
getdents64(4</mountpoint/XXX/development setup>, 0x556fe59df6e0 /* 0 entries */, 32768) = 0
close(4</mountpoint/XXX/development setup>) = 0
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 88) = 88 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 81) = 81 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 81) = 81 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 81) = 81 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 94) = 94 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 94) = 94 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 77) = 77 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 80) = 80 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 100) = 100
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 83) = 83 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 81) = 81 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 96) = 96 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 96) = 96 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 78) = 78 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 81) = 81 
write(2</root/fpart_debug/fs_root_strace_embtfs.log>, "/mountpoint/XXX/devel"..., 72) = 72 
openat(AT_FDCWD, "..", O_RDONLY|O_CLOEXEC|O_DIRECTORY) = 4</mountpoint>

I'm still not sure what to make of this because getdents64 seems to give the expected output but there are no following newfstatat calls (the behavior i could see on previous, correctly listed, directories)

I have to say, my understanding of fuse and the corresponding syscalls is extremely limited. If you are convinced that this is a failure in fuse/sshfs feel free to close this issue. Otherwise I'd be happy to do more debugging.

Cheers, Valentin

martymac commented 2 years ago

Hello Valentin,

Thanks for your feedback.

Really seems like an underlying FS (SSHFS or FUSE) failure to me (getdsent() returning 17 enrtries, not reachable afterwards). If you want to get further (now that you can reproduce the problem with embedded fts), you could use gdb (rebuild fpart with --enable-debug) and try to see what stat calls are failing here.

I'll close that PR for the moment, feel free to re-open it if you need further help.

Best regards,

Ganael.