torarnv / sparsebundlefs

FUSE filesystem for reading macOS sparse-bundle disk images
BSD 2-Clause "Simplified" License
330 stars 38 forks source link

Bug in check for hitting max number of file descriptors #35

Closed syakhmi closed 3 years ago

syakhmi commented 3 years ago

Firstly, thank you for creating such a useful tool!

While copying files from a Time Machine backup, I found that cp would fail with an input/output error on large files. Like others (e.g., #20), I found by running sparsebundlefs in the foreground that the problem was that the tool was running into my system's (Ubuntu 20.04) default limits on open file descriptors (1024 on most distros). I found one way to resolve this was to increase the limit for the running process with prlimit to a large number.

However, regardless of limit, I found the logic in sparsebundlefs that checks if the limit is reached does not function as intended. By adding in extra debug prints, I found that opens start failing well before the check if (sparsebundle->open_files.size() + 1 >= fd_limit.rlim_cur) on line 399 is tripped:

sparsebundlefs: fd_limit.rlim_cur: 1024
sparsebundlefs: open_files.size(): 1018
sparsebundlefs: iterating 131072 bytes at offset 52504297472
sparsebundlefs: processing 131072 bytes from band 1873 at offset 0
sparsebundlefs: preparing 131072 bytes at offset 0
sparsebundlefs: file /home/[USER]/timemachine/[FILENAME].sparsebundle/bands/1873 not opened yet, opening
sparsebundlefs: failed to open band /home/[USER]/timemachine/[FILENAME].sparsebundle/bands/1873: Too many open files
sparsebundlefs: asked to read 131072 bytes at offset 52504428544 using zero-copy read
sparsebundlefs: fd_limit.rlim_cur: 1024
sparsebundlefs: open_files.size(): 1019
sparsebundlefs: iterating 131072 bytes at offset 52504428544
sparsebundlefs: processing 131072 bytes from band 1873 at offset 131072
sparsebundlefs: preparing 131072 bytes at offset 131072
sparsebundlefs: failed to open band /home/[USER]/timemachine/[FILENAME].sparsebundle/bands/1873: Too many open files

I would guess that not all file descriptors opened are accounted for in this check. I found the input/output error could be resolved by adding a safety margin to this check, e.g.,:

if (sparsebundle->open_files.size() + kNoFileSafetyMargin + 1>= fd_limit.rlim_cur) {
    syslog(LOG_DEBUG, "hit max number of file descriptors");
    sparsebundle_read_buf_close_files();
}
torarnv commented 3 years ago

Thanks for your report!

Perhaps some files are implicitly opened/counted towards our process? It might be better to check how many files the kernel thinks we have open rather than tracking it ourselves.

syakhmi commented 3 years ago

Yes, I think you're right. I found that

ls /proc/$PID/fd/ | wc -l

would return exactly 1024 when the error occurred.

torarnv commented 3 years ago

Thanks! What is the initial value, before opening any files?