Closed aayushshah15 closed 9 months ago
I'd probably suggest taking a snapshot after like, another minute or two on the origin, then use zstream dump
to see what an incremental from the snapshot that you cloned to the current one says changed, because without more data, my guess would be some weird interaction where because it's got a dirty journal at the moment of the snapshot, it's clearing it on mount.
Something like a zpool sync
before the snapshot might be a hacky workaround for your use case atm. Depends what's happening.
Something like a zpool sync before the snapshot might be a hacky workaround for your use case atm. Depends what's happening.
zpool sync
ing before the clone doesn't seem to help, and since we're using zvols, the output of zstream dump
wont be consumable. My sense is that this is a common enough usecase that we're likely holding something wrong, as opposed to hitting a real bug. Would appreciate any other pointers.
The point of suggesting zstream dump
was a more easy to explain to you how to read version of "look at which parts of the zvol object changed to go look at what those structures on the filesystem in the zvol contain".
You could also accomplish that with diffing a very verbose zdb
's output, if you really wanted to, but that's going to be literal MB of output for any nontrivially sized zvol.
update: we're no longer seeing the inconsistency as long as we unmount the zvol before snapshotting it, so there's likely some (undocumented?) interaction here that was causing the stated behavior.
@aayushshah15 Since you put another file system (ext4) on top of ZVOL, it likely has its own write caches, content of which is invisible for ZVOL yet when you snapshot it. You should flush those caches before snapshotting. Ideally unmount it. Otherwise on next mount from snapshot you'll see it potentially inconsistent, as if system has crashed at the time.
That makes sense and lines up with what we're seeing, thanks @rincebrain and @amotin. I'll close this issue.
System information
Describe the problem you're observing
We're observing that a zvol snapshot is not consistent with its source zvol immediately after the snapshot was taken, without any modifications being made to the source zvol.
At a high level, we're creating clones of a base zvol (which contains an ext4 formatted ubuntu filesystem) to pass off to firecracker microVMs. We're seeing that sometimes the microVM detects that its
dpkg
database is in a corrupt state. It seems to point to the/var/lib/dpkg/info/format
file being empty as the reason for this.Manually inspecting the contents of our base zvol (which is unmounted immediately after it is hydrated with our ubuntu rootfs) confirms that the
/var/lib/dpkg/info/format
is not empty, whereas the same file in a clone of the snapshot is empty.Describe how to reproduce the problem
Here is a snippet from an ansible playbook that seems to reliably reproduce the issue
In this script, the first assertion (second step) succeeds but the second assertion (the last step) doesn't. Are we misunderstanding something here?
Include any warning/errors/backtraces from the system logs