--alloc-start (-A) option missing in btrfs-progs

allhavebrainimplantsandmore commented 6 months ago

I'm on Fedora 39 and btrfs-progs 6.6.2 and I am missing the -A (--alloc-start) option in mkfs.btrfs to set an arbitrary sized offset for a btrfs filesystem on a device. The default is also not zero as I saw on ubuntu's mkfs.btrfs page. I don't know what's going on with this.

grumpey commented 6 months ago

https://github.com/kdave/btrfs-progs/commit/4bd94dba8aa2ca5c78991a412da7a882c9e28ff3

adam900710 commented 6 months ago

Using an offset is not supported by kernel nor user space tools for quite a while.

Thus setting that value won't help but screw your fs.

allhavebrainimplantsandmore commented 6 months ago

Using an offset is not supported by kernel nor user space tools for quite a while.

Thus setting that value won't help but screw your fs.

No, this might have security implications and this option must be restored asap.

kdave commented 6 months ago

Please explain how it's used for security, it was never meant for such use case so we don't have prior knowledge/specification of that.

kdave commented 6 months ago

So, the option is gone from kernel since 4.13 (year 2017, commit https://github.com/torvalds/linux/commit/0d0c71b317207082856f40dbe8a2bac813f49677), the kernel is out of LTSS support too. Implementing the functionality would be effectively starting from scratch and for that we need the use case evaluation.

allhavebrainimplantsandmore commented 6 months ago

Having a non-zero offset that's filled with zeros on a luks encoded mapped devices allows an outsider to know what your encrypted data is with certainty and increases decryption attack surfaces. You never want to have any portion of your encrypted device to be known at a certain location, especially if that data is plain zeros, which seems to be the default for the btrfs offset. I'm on Fedora 39 and a luks device /dev/mapper/luks-xxx formatted with btrfs creates a giant zero-filled offset at the beginning of the encrypted /dev/mapper/luks-xxx. Mathematically it might be possible to glean a decryption vector of some encryption algos when you have enough known position zero-filled sectors.

adam900710 commented 6 months ago

Mathematically it might be possible to glean a decryption vector of some encryption algos when you have enough known position zero-filled sectors.

Please prove this first. And to me, it looks more like a problem in the algo.

Secondly, if you really set a super offset other than the default one, you're screwed, no tool can read it anyway.

Remember, all fs rely on a fixed offset to bootstrap its superblock.

If you just don't want to fill the first 1M (excluding the superblock) with zero, use -K option at mkfs time.

kdave commented 6 months ago

So the argument is that the first megabyte has a known value and weakens encryption. I've verfied that filling the device with random data and then created the filesystem will indeed reset the contents back to zeros.

What we can do:

add an option to mkfs to fill the first megabyte with random data unconditionally, or
preserve any data and only overwrite the blocks

the option 2 requires a user to do that by another command (like dd) so 1 is for convenience. Alternatively the unused byte range under the 1M can be reinitialized any time later (with a script or tool support if needed). Does this help for your use case, @allhavebrainimplantsandmore ?

allhavebrainimplantsandmore commented 6 months ago

So the argument is that the first megabyte has a known value and weakens encryption. I've verfied that filling the device with random data and then created the filesystem will indeed reset the contents back to zeros.

Not JUST a known value but a very special case (mathematically) of zeros.

What we can do:

add an option to mkfs to fill the first megabyte with random data unconditionally, or

preserve any data and only overwrite the blocks

the option 2 requires a user to do that by another command (like dd) so 1 is for convenience. Alternatively the unused byte range under the 1M can be reinitialized any time later (with a script or tool support if needed). Does this help for your use case, @allhavebrainimplantsandmore ?

Yes, this is along the lines of what is needed. Option 1 is not for convenience, it is a requirement to make btrfs a security-aware project.

adam900710 commented 6 months ago

Firstly you still didn't prove the threat is there.

Secondly, even if your assumption is correct, there is no way to prevent end users to create all zero/pattern filled files. Which could cause the same problem, as end users can go fiemap to get the "physical" location of the all zero/pattern files, then do the same thing.

I'm never a fan of over-reaction to something that MAY be a problem and waste tons of development on something unproven.

adam900710 commented 6 months ago

Furthermore, the very basic concept of a modern encryption algorithm is, even if you know both plain and cipher texts (aka, known-plaintext attack), you should not be able to figure out the encryption key.

At least AES is able to handle such attack.

Thus I'm strongly suspicious about your idea.

Zygo commented 6 months ago

Some years ago I ran a private survey of users' filesystem contents, and the most common data block in the survey was the 4K all-zeroes block, appearing in as many as 5% of all data blocks--a ratio so high that dedupe implementations often have to handle them as a special case to avoid clogging the filesystem with excessive references to a single data block. Also, in btrfs metadata blocks, unused bytes in the 16K metadata pages are zero, so any mostly-empty page will contain all-zero 4K blocks in the middle.

It would not be unusual for a large filesystem to have tens of millions of all-zero blocks. The -A option might affect the location of fewer than 256 of those. btrfs could be modified to add whitening to reduce all-zero signals going into the encryption algorithm, but it's much easier to simply do that in the encryption layer below btrfs.

If your attacker is searching for a zero-filled encrypted block, they may need fewer than 100 random blocks from the device, and maybe as few as 20 random blocks, to find one all-zero block without knowing anything about the filesystem or the data on it. If your chosen algorithm is weak on zero-filled blocks, then you need to choose a better algorithm. There are several popular algorithms that are known bad for FDE because of this kind of issue, even if they are OK when used in other cryptosystems.

kdave commented 5 months ago

Thanks for the insights. I don't see an easy way from filesystem POV to prevent the zero blocks to be stored so that it's impossible the detect them by statistical analysis. The known cases:

first megabyte (except 64K offset for the superblock)
unused space in the metadata blocks, with default 16K nodes, that's 4x4K, blocks say the middle two will be always zeros for high level nodes, there are enough trees for the sampling and there are older generations stored on the devices (due to COW mechanism)
applications fill files with zeros

We can possibly do something about the first two and fill the unused space by random data. For the application data the blocks would need to be salted somehow but at this point nothing is done about that as the fscrypt integration is still work in progress.

Compression would not help here much I think. It may make the sampling harder, adding a few more classes of the repeated blocks by the length, ie. 4K zero block will always compress to the same bytes, 8K etc.

allhavebrainimplantsandmore commented 5 months ago

applications fill files with zeros

I don't think that's an issue you need to worry about as much because location of these zero blocks will most likely not be very easy to predict. It'd be nice to have this but as a much lower priority feature.

Would be really a major security hardening to implement this. I don't think there's much CPU hit on filling with random data but there might be? Is that why @kdave you mention salting?

kdave commented 5 months ago

Yes location of the blocks will be random but taking samples from the storage device is possible and could be done in any scope. If we're assuming a strong attacker (potentially breaking AES on a zeroed block) a statistical analysis is probably happening as well. I'd agree that this is harder than taking the first 1M of the partition (providing ~256 4K blocks) but IMO still a practical attack.

I don't have an estimate of the amount of blocks to be filled with random data, the true random pool can't probably sustain is so it would have to be some combination of a truly random seed and a XOF (extendable output function) providing the data.

Randomizing the first 1M is easy and one time, also can be done on an unmounted filesystem or even a mounted one

allhavebrainimplantsandmore commented 5 months ago

If we're assuming a strong attacker (potentially breaking AES on a zeroed block) a statistical analysis is probably happening as well. I'd agree that this is harder than taking the first 1M of the partition (providing ~256 4K blocks) but IMO still a practical attack.

I agree. I don't have an estimate of the amount of blocks to be filled with random data, the true random pool can't probably sustain is so it would have to be some combination of a truly random seed and a XOF (extendable output function) providing the data.

Yasssss! Randomizing the first 1M is easy and one time, also can be done on an unmounted filesystem or even a mounted one

Ideally, existing as a default formatting option too.

kdave / btrfs-progs

--alloc-start (-A) option missing in btrfs-progs #723