mhx / dwarfs

A fast high compression read-only file system for Linux, Windows and macOS
GNU General Public License v3.0
2.12k stars 56 forks source link

SIGBUS happened again - twice! #50

Closed ghost closed 1 year ago

ghost commented 3 years ago
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
scanning: /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2tca/backups/backup-2b2tca-10june2019/main_map_nether/DIM-1/region/r.11.-2.mca
678293 dirs, 297774/10 soft/hard links, 46525/5413471 files, 0 other
original size: 91.28 GiB, dedupe: 24.74 GiB (15061 files), segment: 0 B
filesystem: 0 B in 0 blocks (0 chunks, 31454/5398400 inodes)
compressed filesystem: 0 blocks/0 B written
▏                                                                                                                                                                                      ▏  0% /
*** Aborted at 1623187919 (Unix time, try 'date -d @1623187919') ***
*** Signal 7 (SIGBUS) (0x7f3559c9b000) received by PID 5233 (pthread TID 0x7f35853eb700) (linux TID 5254) (code: nonexistent physical address), stack trace: ***
Bus error (core dumped)

The same issue, as described in issue #45, happened again. On the pre-compiled 0.5.5 release binaries. Twice, actually - the first time was on a completely different system, but the second time I was able to get a core dump. Even better, I actually think I know what's causing it.

When I ran mksquashfs instead of mkdwarfs on the exact same data, this happened:

nabla@satella /media/veracrypt3/dwarfs $ doas mksquashfs /media/veracrypt3/dwarfs/mount/ /media/veracrypt2/LiterallyEverything-08-Jun-2021.sqsh -comp zstd -Xcompression-level 22 -b 1M
Parallel mksquashfs: Using 12 processors
Creating 4.0 filesystem on /media/veracrypt2/LiterallyEverything-08-Jun-2021.sqsh, block size 1048576.
[|                                                                                                                                                                     ]   33715/7846602   0%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-105.2.mca, creating empty file
[\                                                                                                                                                                     ]   34894/7846602   0%
Read failed because Input/output error
[|                                                                                                                                                                     ]   34894/7846602   0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-13.5.mca, creating empty file
[/                                                                                                                                                                     ]   35075/7846602   0%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-1332.157.mcr, creating empty file
[/                                                                                                                                                                     ]   35771/7846602   0%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-18.-17.mcr, creating empty file
[/                                                                                                                                                                     ]   36003/7846602   0%
Read failed because Input/output error
[-                                                                                                                                                                     ]   36003/7846602   0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-19.6.mcr, creating empty file
[-                                                                                                                                                                     ]   47362/7846602   0%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.15.7.mca, creating empty file
[/                                                                                                                                                                     ]   47552/7846602   0%
Read failed because Input/output error
[-                                                                                                                                                                     ]   47553/7846602   0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.16.-30.mcr, creating empty file
[=/                                                                                                                                                                    ]   48764/7846602   0%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.19531.31249.mca, creating empty file
[=/                                                                                                                                                                    ]   54736/7846602   0%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.45.-34.mcr, creating empty file
[=/                                                                                                                                                                    ]   63050/7846602   0%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-14.-9.mcr, creating empty file
[=|                                                                                                                                                                    ]   63522/7846602   0%
Read failed because Input/output error
[=/                                                                                                                                                                    ]   63522/7846602   0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-16.-8.mca, creating empty file
[=|                                                                                                                                                                    ]   63758/7846602   0%
Read failed because Input/output error
[=/                                                                                                                                                                    ]   63760/7846602   0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-17.5.mcr, creating empty file
[=-                                                                                                                                                                    ]   66762/7846602   0%
Read failed because Input/output error
[=\                                                                                                                                                                    ]   66762/7846602   0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-33.21.mcr, creating empty file
[=\                                                                                                                                                                    ]   88907/7846602   1%
Read failed because Input/output error

Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2tca/backups/backup-2b2tca-10june2019/main_map/region/r.-17.-6.mca, creating empty file
[=\                                                                                                                                                                    ]   91183/7846602   1%
Read failed because Input/output error
[=|                                                                                                                                                                    ]   91183/7846602   1%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2tca/backups/backup-2b2tca-10june2019/main_map/region/r.-3.31.mca, creating empty file
[==-                                                                                                                                                                   ]  115004/7846602   1%

So, here's my theory (and I'm bad at these theories, so please take it with a grain of salt): mksquashfs probably just reads these files with regular old open() and read() calls, so whenever it encounters an I/O error, it can just skip the file and create an empty one as if nothing happened. But mkdwarfs, as @mhx mentioned in the previous issue about this, makes extensive use of mmap, so perhaps every time an I/O error occurs, the region of memory represented by the file it's trying to read becomes inaccessible, and a SIGBUS is triggered instead?

Perhaps this SIGBUS could be caught, and behavior similar to mksquashfs could be preserved whereby the file is simply skipped and replaced with an empty one, or maybe we could try re-reading the file several times before giving up and moving on instead?

Also of note - these I/O errors are coming from a mounted DwarFS filesystem. When I got the SIGBUS error in #45, I was trying to read from a bunch of SquashFS filesystems, not a bunch of DwarFS filesystems, so this could be an issue with the physical disk (Although I still think DwarFS should definitely be robust enough to skip these errors rather than completely bailing out, as I uh... do actually need to recover these files)

Even more bizarrely, despite both SquashFS and DwarFS failing consistently when trying to read roughly the same files, when I re-mounted the 2b2tca.dwarfs filesystem in question, I was able to read all of its content without any I/O errors at all. Truly baffling.

Anyway, I have replied to the email I sent @mhx of the core dump for the previous issue with the new core dump (which is actually much smaller this time), so hopefully it's possible to figure out where this is specifically happening.

mhx commented 3 years ago

That's awesome, thanks!

Haven't looked at the coredump yet, but I've got a few questions...

nabla@satella /media/veracrypt3/dwarfs $ doas mksquashfs /media/veracrypt3/dwarfs/mount/ /media/veracrypt2/LiterallyEverything-08-Jun-2021.sqsh -comp zstd -Xcompression-level 22 -b 1M
Parallel mksquashfs: Using 12 processors
Creating 4.0 filesystem on /media/veracrypt2/LiterallyEverything-08-Jun-2021.sqsh, block size 1048576.
[|                                                                                                                                                                     ]   33715/7846602   0%
Read failed because Input/output error

Also of note - these I/O errors are coming from a mounted DwarFS filesystem.

Even more bizarrely, despite both SquashFS and DwarFS failing consistently when trying to read roughly the same files, when I re-mounted the 2b2tca.dwarfs filesystem in question, I was able to read all of its content without any I/O errors at all. Truly baffling.

If this true, I wonder where the DwarFS/SquashFS images you're mounting are located. Are you accessing the images over NFS or some other network filesystem by any chance? (From the command line I guess the answer is "no". I don't know how veracrypt would fit into the whole equation.)

At least, if you're able to read the files after re-mounting, this would mean there's no issue with the consistency of the file system image.

Actually, I've looked at the coredump now, and it's consistent with your theory. The crash happens when trying to read the file (more specifically, when trying to compute the checksum — so it's also consistent with the crash from #45).

One more question: is the source DwarFS image using compressed metadata?

So here's my theory (which may very well also be flawed):

I'll definitely have to try and set something like this up. It's probably time to try out one of the fault-injecting FUSE drivers out there.

The mystery question that remains is: why do mounted DwarFS/SquashFS fail to read file data in the first place?

ghost commented 3 years ago

So, the errors (SIGBUS, I/O error) happen when reading files from a mounted DwarFS image?

Yes

When re-mounting the DwarFS image, these read errors go away?

As far as I can tell, yes

In #45, you've probably seen something similar happen reading from a mounted SquashFS image?

Probably? I didn't try running mksquashfs on the mounted SquashFS image, so I have no idea if it returned I/O errors or something, all I know is that DwarFS crashed trying to read it, and then didn't crash the next time I tried.

Are you accessing the images over NFS or some other network filesystem by any chance?

No. It's the same Toshiba PC P300 I described in #48, encrypted with Veracrypt, with a Btrfs partition on top of the veracrypt loop device, containing a bunch of DwarFS files.

is the source DwarFS image using compressed metadata?

Yes, mkdwarfs was executed with -l7

why do mounted DwarFS/SquashFS fail to read file data in the first place?

I'm not entirely sure. I can't check the drive's SMART attributes because it's being accessed over USB 3.0, but the drive isn't really a likely candidate for failure - I can read the entire content of the drive with dd if=/dev/sdf without causing any I/O errors, and the drive itself is only 2-3 years old and spends the vast majority of its time in a closet in a protective caddy - the power on hours figure is probably below 500, at least. Obviously this doesn't help much because we can't really be sure exactly what state the drive is in even if I did have access to the SMART attributes, but so far I have only gotten I/O errors specifically from trying to run mksquashfs on a mounted dwarfs filesystem.

though I'd expect to see dwarfs SIGBUS in this case

Actually, on the "other system" I briefly mentioned in my first post, one of the dwarfs processes actually did crash. I didn't get any error message or anything because it wasn't running in the foreground, but I know that when mkdwarfs crashed, the corresponding dwarfs process for the filesystem that mkdwarfs was trying to read from also disappeared, and I couldn't access the filesystem it was trying to read at all anymore until I unmounted and remounted it.

mhx commented 3 years ago

Actually, on the "other system" I briefly mentioned in my first post, one of the dwarfs processes actually did crash. I didn't get any error message or anything because it wasn't running in the foreground, but I know that when mkdwarfs crashed, the corresponding dwarfs process for the filesystem that mkdwarfs was trying to read from also disappeared, and I couldn't access the filesystem it was trying to read at all anymore until I unmounted and remounted it.

Ohhh, that's super useful to know!

So I would assume that the dwarfs process died first, and mkdwarfs SIGBUSed as a result.

Which makes it more and more likely that something is happening to the filesystem that the source image is stored on while the dwarfs process is running, rendering the mmaped region invalid.

Can you take a look at your syslog if you can see a dwarfs crash somewhere?

ghost commented 3 years ago

Uh, my syslog is absolutely huge.. what should I look for? I tried grepping for dwarfs but couldn't find anything.

edit: I am currently copying all of the DwarFS filesystems onto a brand new hard drive (formatted with something other than btrfs) instead of the Toshiba drive to see if it still encounters the same issues.

ghost commented 3 years ago

Oh no. Screenshot_2021-06-09_11-13-50

mhx commented 3 years ago

Uh, my syslog is absolutely huge.. what should I look for? I tried grepping for dwarfs but couldn't find anything.

Yeah, if that doesn't bring up anything, then it's likely that crashes don't get logged, typically this looks something like:

Nov 24 16:14:09 balrog kernel: [48877243.244481] progress[11041]: segfault at 10 ip 000000000043a9c5 sp 00007faff39e9bd0 error 4 in mkdwarfs[40d000+e2000]
mhx commented 3 years ago

Oh no. Screenshot_2021-06-09_11-13-50

Ouch! :(

ghost commented 3 years ago

Well I guess that explains the issue then - the drive must actually be failing, or at least the contents of the drive are somewhat corrupt enough to return an I/O error to veracrypt and then to btrfs and then to dwarfs and then to mkdwarfs.

If I copy all of this stuff to another drive with ddrescue or something, there may still be a few corrupt blocks within the files; do you reckon dwarfs will still be able to read them enough for me to recover most of my data if some of it is just corrupt rather than triggering I/O errors?

As far as I'm aware, it's the I/O errors specifically that are triggering the SIGBUS, but if I copy this to a drive that doesn't throw I/O errors and dwarfs just has to deal with a few odd-looking zero blocks, it'll be able to skip those files and keep going, right?

mhx commented 3 years ago

Well I guess that explains the issue then - the drive must actually be failing, or at least the contents of the drive are somewhat corrupt enough to return an I/O error to veracrypt and then to btrfs and then to dwarfs and then to mkdwarfs.

That unfortunately sounds quite likely.

If I copy all of this stuff to another drive with ddrescue or something, there may still be a few corrupt blocks within the files; do you reckon dwarfs will still be able to read them enough for me to recover most of my data if some of it is just corrupt rather than triggering I/O errors?

In principle, the data should be recoverable if the metadata block is intact (which seems to be the case, as you were previously able to mount the image). However, depending on the type of corruption, it is highly likely that the FUSE driver will currently just bail out with an error, e.g.:

$ ./dwarfs broken.dwarfs mnt -f
I 12:43:49.849616 dwarfs (v0.5.5-3-g1e46f22-dirty, fuse version 35)
E 12:43:49.859210 error initializing file system: truncated section data

Recovery is something that's definitely on my list. However, for now I've mostly focused on making sure the data is recoverable, but there's no code yet that would attempt to perform a recovery.

ghost commented 3 years ago

According to ddrescue, it looks like what I'm dealing with in the case of feb2020vms-loose-cmpother-vms-apr2021.dwarfs is a random block of zeros that is 32768 bytes large somewhere in the file (of which is 289.2 GiB in total). When it's done trying to rescue everything, I'll try re-compressing the archive and see what comes up.

mhx commented 3 years ago

I highly recommend par2 to create redundancy for your images (or in fact any data that you'd prefer to be able to fully recover), and ideally store the redundancy file(s) on a separate medium.

I've been using this with DwarFS images before and it's outright amazing.

I should definitely add a section about this to the readme.

it looks like what I'm dealing with in the case of feb2020vms-loose-cmpother-vms-apr2021.dwarfs is a random block of zeros that is 32768 bytes large somewhere in the file

This would be trivial to fix with par2. Anyway, it highly depends on where that random block of zeros is located:

This means you'll be able to recover all file data that doesn't reference the corrupt block. I think (and hope) this will be the most likely scenario in your case, and it's somewhat consistent with the fact that you were able to mount the filesystem image before.

ghost commented 3 years ago

I highly recommend par2 to create redundancy for your images

Oh wow, this looks awesome, thanks. I'll definitely have to use this next time round.

If it only has corrupted file data (not too unlikely)

Yeah, I'm fairly certain this is the case.

cat: mnt/code.html: Input/output error

This does seem to imply that dwarfs will trigger an I/O error when I re-run mkdwarfs on the mounted filesystem though, will that still trigger a SIGBUS, or do you reckon the I/O error will occur before mkdwarfs tries to mmap() it?

mhx commented 3 years ago

do you reckon the I/O error will occur before mkdwarfs tries to mmap() it?

I'm afraid this is going to trigger the SIGBUS issue. Also, unfortunately, the SIGBUS issue isn't trivial to fix. It's not impossible or even seriously complicated, but it's sadly not something that's done in one afternoon, and I can't tell you when I'll get around to doing it.

mhx commented 3 years ago

It's not impossible or even seriously complicated,

Mmmh, I was probably a bit too optimistic with that statement. The code that the referenced blog post suggests will only work in the most trivial cases. As soon as you call out to a library or even some moderately complex C++ code, you're likely going to run into at least resource leaks, if nothing worse.

The easiest way to protect against SIGBUS would be to mlock() each memory region while accessing it. Problem is: unprivileged processes can (unless configured otherwise in limits.conf) at most lock 64 KiB of memory, which is rather useless for what DwarFS is doing. If you can't mlock(), then you have to choose between potential SIGBUS (and having to handle it somehow) or a different (and slower) API. Not using mmap() means that file contents will have to be temporarily copied into RAM buffers to be able to work with code that doesn't have file-based APIs (e.g. checksumming).

I'll have to think thoroughly about the best way to deal with this issue.

mhx commented 3 years ago

Okay, I think I have a rough idea what I'm going to do here.

Replacing mmap completely is not an option, at least for the FUSE driver. In mkdwarfs it could potentially be replaced, but the code would likely be a lot uglier than it used to be. So at least for now, I'm not going to replace it.

That means the tools will still bail out with SIGBUS on broken hardware. While not ideal, I don't think this is the end of the world.

However, I'll make sure that if you have a corrupted image (such as the one you've recovered with ddrescue), as much as possible of the files is preserved. As single files can span multiple blocks, of while only one may be corrupt, this means files can at least partially be recovered. The remainder of the files will be zeros. I might even attach xattrs to help identify (partially) corrupt files.

This also means you'd theoretically be able to run mkdwarfs on a mounted, corrupt image. I don't necessarily think that's a great idea, but at least it wouldn't SIGBUS anymore.

Long term, I might consider other option, like an mmap emulation that can be enabled on demand.

bionade24 commented 2 years ago

I got SIGBUS, too. But in a diffrent situation that maybe hasn't anything to do with bad reads. I was to lazy to boot from a USB stick and mkdwarfs the device's root so I mounted it at /mnt and started mkdwarfs. This way, I diddn't have created any recursion for sure. The disk is a 15 months old nvme ssd. No read errors in the kernel log.

writing: /mnt/usr/share/licenses/linux-firmware/WHENCE
67752 dirs, 56991/13068 soft/hard links, 551433/551433 files, 0 other
original size: 96.16 GiB, dedupe: 4.638 GiB (115043 files), segment: 28.51 GiB
filesystem: 47.78 GiB in 3058 blocks (6543632 chunks, 94853/423322 inodes)
compressed filesystem: 1287 blocks/9.681 GiB written [depth: 20000]
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                  ▏ 63% \
*** Aborted at 1633208871 (Unix time, try 'date -d @1633208871') ***
*** Signal 7 (SIGBUS) (0x7f0ac1f64000) received by PID 1481 (pthread TID 0x7f0aae5ff640) (linux TID 1760) (code: nonexistent physical address), stack trace: ***

Assertion failure: cu.version >= 2 && cu.version <= 4
Message: invalid info version
File: /build/src/dwarfs-0.5.6/folly/folly/experimental/symbolizer/Dwarf.cpp
Line: 292
Function: folly::symbolizer::detail::CompilationUnit folly::symbolizer::{anonymous}::getCompilationUnit(folly::StringPiece, uint64_t)
mkdwarfs(+0x188273)[0x55cf3177d273]
mkdwarfs(+0x18daea)[0x55cf31782aea]
mkdwarfs(+0x187f52)[0x55cf3177cf52]
/usr/lib/libpthread.so.0(+0x1386f)[0x7f0ac1a9286f]
/usr/lib/libc.so.6(gsignal+0x142)[0x7f0ac133dd22]
/usr/lib/libc.so.6(abort+0x115)[0x7f0ac1327861]
mkdwarfs(+0x3805a)[0x55cf3162d05a]
mkdwarfs(+0x380f5)[0x55cf3162d0f5]
mkdwarfs(+0x38941)[0x55cf3162d941]
mkdwarfs(+0x19ce79)[0x55cf31791e79]
mkdwarfs(+0x18a53c)[0x55cf3177f53c]
mkdwarfs(+0x18b2f8)[0x55cf317802f8]
mkdwarfs(+0x18d49f)[0x55cf3178249f]
mkdwarfs(+0x18db2c)[0x55cf31782b2c]
mkdwarfs(+0x187ad3)[0x55cf3177cad3]
/usr/lib/libpthread.so.0(+0x1386f)[0x7f0ac1a9286f]
mkdwarfs(+0xfcfe7)[0x55cf316f1fe7]
mkdwarfs(+0xfdcea)[0x55cf316f2cea]
mkdwarfs(+0xcecb8)[0x55cf316c3cb8]
mkdwarfs(+0xe48df)[0x55cf316d98df]
/usr/lib/libstdc++.so.6(+0xd33c3)[0x7f0ac15bb3c3]
/usr/lib/libpthread.so.0(+0x9258)[0x7f0ac1a88258]
/usr/lib/libc.so.6(clone+0x42)[0x7f0ac13ff5e2]
Entered fatal signal handler recursively. We're in trouble.
(safe mode, symbolizer not available)
zsh: abort      sudo mkdwarfs -i /mnt -L 60G -o 
bionade24 commented 2 years ago

No problems after booting from a different device, so in this case mlock() wouldn't help, would it? Only SIGBUS handling would help probably.

mhx commented 2 years ago

I was to lazy to boot from a USB stick and mkdwarfs the device's root so I mounted it at /mnt and started mkdwarfs.

Just to be sure I understand this correctly: you mounted the root partition of the currently running system to /mnt?

If this is the case, then most likely the SIGBUS was caused by a file that has been modified while mkdwarfs was running. I'd have to double-check the code, but I don't think there's any check for this at the moment. A size check might be sufficient to avoid the SIGBUS, but any other modification would potentially render the filesystem contents invalid.

Can you please clarify if this was what you did, or describe exactly what you did to cause the SIGBUS?

Thanks!

bionade24 commented 2 years ago

I did mount the rootfs partition while running at /mnt and started mkdwarfs on /mnt.

mhx commented 2 years ago

So yeah, that means the data read by mkdwarfs were potentially volatile (e.g. logfiles being written/rotated). Handling SIGBUS could prevent this, but as mentioned above, this would be quite hard to get working properly. What I can certainly do is perform an extra check when re-accessing a file to see if it has changed in the meantime. However, I'm quite certain this won't prevent all issues (I'm not entirely sure right now what happens if you truncate/delete an mmap'd file).

As mentioned earlier in the thread, long-term I'll probably implement an abstraction that lets you choose between mmap and regular file access. This would make it easier to handle file access errors, likely at the expense of reduced performance.

bionade24 commented 2 years ago

Couldn't you check if the file is accessed like lsof does?

mhx commented 2 years ago

Couldn't you check if the file is accessed like lsof does?

I probably could, but what then? And what if the file gets accessed/modified while mkdwarfs has already mmap'd it? Using the rsync approach (copy whatever is there), you might end up with corrupt/partial files in your filesystem. For rsync, this is probably fine, as these will likely get fixed during the next run. For mkdwarfs, you're stuck with a broken file.

It's safe to say that (at least currently) it's unsafe to use mkdwarfs on volatile data.

mhx commented 1 year ago

I'll close this for now. I still believe the issue is a corrupted source file system.