Closed tpwrules closed 10 months ago
Whoa, looks like I've been living under some rock for the last 20 years...
Fortunately, this should be trivial to change in a backwards-compatible way. The integer types used in the metadata definition only affect the C++ representation, no the actual metadata binary format. I can change this to a 32-bit type without breaking compatibility with metadata from older images. The data structure used internally to collect UIDs and GIDs also uses 16-bit types, but that's an easy change as well.
Thanks a lot for spotting this, I'll definitely fix this for the next major release.
Funnily enough, in the stat helper I've added for Windows compatibility, I've used 32-bit types for UIDs and GIDs. ¯\(ツ)/¯
Fix is now on the main branch.
Confirmed everything round trips properly now on ext4 and with FUSE 2. Thanks for the fix.
I wonder if it could be a problem if a dwarfs image now has more than 65536 unique UIDs or GIDs? Could that overflow the index numbers? Such a thing is pretty unlikely but it would be good to at least print an error if that table fills up instead of wrapping around and silently corrupting the image.
Confirmed everything round trips properly now on ext4 and with FUSE 2. Thanks for the fix.
Thanks for confirming.
I wonder if it could be a problem if a dwarfs image now has more than 65536 unique UIDs or GIDs? Could that overflow the index numbers?
Yes, that could definitely happen.
Such a thing is pretty unlikely but it would be good to at least print an error if that table fills up instead of wrapping around and silently corrupting the image.
Agreed that it'd be unlikely, but the fix turns out to be even more trivial than the previous one widening the UID/GID fields.
Nice catch, thanks!
Linux has supported 32 bit UIDs and GIDs on files for a long time now. These are not so likely on normal systems as they are unlikely to have tens of thousands of them, but have come into significant popularity with various container engines to separate host and contained IDs. As a result, files (such as everything in a directory used as a container rootfs) can have UIDs and GIDs > 65535.
Dwarfs truncates these at some point and unfortunately the metadata format appears fixed to store 16 bits. This means that backing up such a rootfs using dwarfs is lossy and the IDs come out ANDed with 65535.
It's unclear if any other parts of the stack might have trouble supporting 32 bit IDs (e.g. fuse 2/3, syscalls, overlayfs) but the format at least needs to be able to support them because they are a legitimate and common occurrence in certain circumstances.