Open lipnitsk opened 10 years ago
Plan is to eliminate use of stream mode entirely in Encfs 2.x (for new filesystems). No plan for Encfs 1.x
Do you already have a plan for what mode to use? CBC with ciphertext stealing seems to be a good option.
The other option would be to go with CTR for the whole file. With CTR, however, an attacker can flip single bits at will, so it would need to go with MAC enabled by default. If ecryptfs has MACs enabled by default (will check) we should probably too, anyway.
CTR has the additional problem that the XOR of two cipertext files copied at two different times is the XOR of the plaintext. To fix that leak you'd need random per-block IVs.
For Encfs2, I'm leaning towards GCM mode (as used in ZFS).
@vgough Salsa20+Poly1305 would also be a viable (and very fast) alternative, as outlined by Thomas Ptacek in his blog: http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/
Actually, i don't think large changes like that are neccessary. Blockwise cbc works fine for everything but the last 16 bytes (the aes block size). By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.
Please don't invent a padding scheme; just pad with PKCS#7 like everyone else. :)
Thanks for the pointer! However, pkcs#7 seems to require that you read the last bytes of the ciphertext to geht the plaintext length. This is one additional seek for every stat(), we should really avoid that as it kills rsync performance.
(It's probably more than one seek, because the filesystem has to parse its internal data structures first to locate the data) So I think what we need is a "headerless" scheme, where you don't have to read any ciphertext to get the length. Unconditionally adding 16 zero bytes (or any value) would to that:
pppppppppp 0000000000000000
^---- 16 bytes zero padding
^-------------------- 10 bytes plaintext
AES encryption (16 byte blocks) ->
cccccccccccccccc 0000000000
^--- 10 bytes of zeros
^------------------- 16 bytes encrypted data
Isn't that a security issue if you know that the last bytes will be (padded with) zero bytes? Maybe better random bytes?
Nope, should be fine.
http://en.m.wikipedia.org/wiki/Known-plaintext_attack Modern ciphers such as Advanced Encryption Standard are not currently known to be susceptible to known-plaintext attacks.
While the current modes of modern ciphers available to encfs might not currently be susceptible to known-plaintext attacks, these types of attacks are typical for cryptanalysis and so this assumption could change after further years of research.
Additionally, encfs offers multiple cipher options. Is this statement true for all ciphers encfs makes available through OpenSSL?
If given two choices for this implementation, are there impacts in choosing one over the other?
Trying to predict how to modify ciphers based on what vulnerabilities might be discovered in the future quickly becomes a wild goose chase. I suspect if you submitted a PR that improved the padding without affecting backwards compat, it would fare better.
A random idea I just thought of: Encode file length (and other small useful metadata) in the encrypted filename. That would reduce the maximum filename length even more than it is now, so if that maximum is reached, substitute a hash of the filename and add the real file name to the end of the file data. That would encode metadata in the file contents only in the (rare) case where the filename is too long, so it wouldn't hurt rsync et al in the common case. And this would resolve the limited filename length problem as well.
In order to make lookups simple, it is preferable that encrypted filenames can be directly computed from plaintext filenames. That way a call to open("foo.txt") doesn't require a directory scan in order to find the encrypted file. Instead, we encrypt "foo.txt" and attempt to open the encrypted name.
Allowing hashed names, to extend allowable file lengths, doesn't hurt too badly since it could still be done without a directory traversal. Encoding metadata into filenames would thwart this, since I'm not aware of any portable way to do a prefix match or otherwise avoid walking the entire directory listing.
Of course. I should have thought it through a bit longer.
No worries, I appreciate the ideas. I've wanted to do the same myself, just didn't figure out a way to make that work.
Is there a chance that there - maybe ;-) - will be a solution for the actual version in next time?
no one who thinks that he can make a fast fix?
Well, this is an incompatible format change, there is no fast fix i'm afraid
uhh. And what's with an not backwards compatible version which is not 2.0?
However, it does not solve the problem, and is not enabled by default.
could you please clarify which commit introduced the fix and which option is used to workaround this issue?
ping @vgough
could you please clarify which commit introduced the fix and which option is used to workaround this issue?
When you configure encfs in expert mode :
Add random bytes to each block header?
This adds a performance penalty, but ensures that blocks
have different authentication codes. Note that you can
have the same benefits by enabling per-file initialization
vectors, which does not come with as great of performance
penalty.
Select a number of bytes, from 0 (no random bytes) to 8:
However, as the audit stated, it does not solve the problem.
Any ideas on what data format shall be implemented?
A disk format based on GCM mode could also help to fix the issues related to MAC headers.
This is the most "important" EncFS security report.
By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.
I like the idea @rfjakob. Rather simple, without changing all existing work. Could be a transitional change, before moving to another whole format. Did you perhaps already / would you work on a patch for this please ?
We would then kill two birds with one stone, as we would then be able to also close #10 👍
No, sorry, no plans of working on this. People who don't mind a format change can move to gocryptfs IMO.
I'll see if I can work on this later on :) I'm also wondering if such a modification would be reverse-write compatible.
By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.
Actually I think that padding with 15 bytes should be OK ?
Right now I don't see why reverse-write would be a problem.
And yes, actually, 15 bytes should be enough.
I then tried to implement the cipherBlockSize - 1
padding (15 bytes in examples below), according to what you described above @rfjakob, adding 15 bytes at the end of each file (but 0 byte files).
Leading to :
pppppppppp 000000000000000
^---- 15 bytes zero padding
^-------------------- 10 bytes plaintext
AES encryption (16 byte blocks) ->
cccccccccccccccc 000000000
^--- 9 bytes of zeros
^------------------- 16 bytes encrypted data
So, I have a working algorithm in the following situations :
But reverse write seems impossible to achieve correctly. Below are some complicated situations. Let's assume block size is 4KB bytes.
I think I will go with OneAndZeroes padding of each block, with a cipherBlockSize - 1
bytes padding for the last block.
We would then still be able to get size of files without having to read the last block.
We would also be able to properly reverse-write, at a cost of one byte per blockSize
.
I think it's worth it.
Any thoughts ?
Thx 👍
Last block :
pppppppppp 100000000000000
^---- 15 bytes OneAndZeroes padding
^-------------------- 10 bytes plaintext
AES encryption (16 byte blocks) ->
cccccccccccccccc 000000000
^--- 9 bytes of zeros
^------------------- 16 bytes encrypted data
Other blocks :
ppppppppppppppp 1
^-------- 1 byte OneAndZeroes padding
^-------------------- 15 bytes plaintext
AES encryption (16 byte blocks) ->
cccccccccccccccc
^------------------- 16 bytes encrypted data
Let's look at the difficulties, I think this should all work:
But we are not sure this is the last block to be written, so we are not sure we should crop...
Yes, we have to stat() the file to find out.
But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...
Again, we can stat() the file to determine if it is the last block. Forward mode has to do this as well, right?
Another note:
Perhaps calling application will come with another write call to complete the 1020 bytes already received...
This does not matter. In forward mode, the file has to be always consistent on disk. The user application may crash at any time and stop writing. But the data it has already written must be safe.
Thx for your feedbacks @rfjakob 👍
I agree if the cipher file is fully available locally.
You may be in a situation where the cipher file would not be locally available, so you would not be able to stat()
it (so you would not be able to know if the block you have been asked to write is the last one of the file).
Think about for example downloading (or syncing, whatever the method used) some remote cipher files directly into a reverse-mounted EncFS.
Forward mode would not work either in this case, right?
It would, because then here you encode data, so you don't expect it to be a multiple of cipherBlockSize
. If the block you are writing is at the end of the local (cipher) file, you assume this is the last block and compute a cipherBlockSize - 1
bytes padding.
I agree if the cipher file is fully available locally.
Can't we stat() the plaintext file instead?
Unfortunately this would not help.
Let's assume we receive a 4KB (blockSize
) cipher block.
According to the write call received, we have to write it as the end of the plaintext file. Perfect.
It could then be the last block of the plain file. But how to be sure ?
How can we then remove the last padding bytes that may exist ?
Without padding every block as proposed above, I don't see :|
If the write expanded the file, if must be the last block, and it must have padding
(otherwise forward mode is buggy)
Not necessarily. Think about a cipher file being dowloaded directly into a reverse-write EncFS (so that it is written decrypted directly to the local disk). Every block received and written will expand the plain file. But only the last one received (and written) will be the real last block of the plain file.
The every block must have padding.
A 15 bytes padding ? Or a OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block ?
Yes, 15 bytes.
At that moment, it's the last block, right?
Look at these use cases :
Backup : plain local -> EncFS reverse -> rsync to remote location
Restore : rsync from remote location -> EncFS reverse -> plain local
I'm not sure backup will need to insert a 15 bytes padding after every block.
Interesting use case, but there are other problems:
plain local -> EncFS reverse -> rsync to remote location -> ciphertext
Now, let's assume the ciphertext contains 1000
. And rsync happens to write()
a chunk of data that ends with 1000
. What does
EncFS reverse -> plain local
do?
// strange duplicate part of your message above deleted
Yes, I think this is the last tricky case. I already thought about this, and I think we need an additional internal buffer.
Let's take your example.
1000%16 = 8 We crop last last 8 bytes. We decode. We remove padding bytes if it looks like we can. We write plain data at the end of the plain file. We return that we wrote 1000 bytes. As 1000 < 4096, we keep the 1000 bytes into an internal buffer, as we may receive the next bytes of the block.
If we receive a write request with the next 1000 bytes, we will not read the 1000 previous bytes of the block from the plain file, as we have cropped some bytes, but will take them from our internal buffer.
I was curious if that use case really works, so I did:
a/zero -> reverse -> b/eNZPWSyw0rxU7T37UwNN3,n9 ----> cp
d/zero -> reverse -> c/eNZPWSyw0rxU7T37UwNN3,n9 <---/
And it seems wo work at first glance:
$ md5sum a/zero d/zero
2d56b031dc8683c233c016429084f870 a/zero
2d56b031dc8683c233c016429084f870 d/zero
So that was easy, lets overwrite the middle of the file with itself:
dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123 seek=43 skip=43 count=1
Random garbage:
$ md5sum a/zero d/zero
2d56b031dc8683c233c016429084f870 a/zero
a22fc0525129c3eb2fe1af2e4bc9fd5d d/zero
However, this (note the odd block size):
dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123
works, and I'm not sure why.
$ md5sum a/zero d/zero
2d56b031dc8683c233c016429084f870 a/zero
2d56b031dc8683c233c016429084f870 d/zero
On decryption, we have to know if it is the last block, because the last block is handled differently. Where do we have this information from?
From: https://defuse.ca/audits/encfs.htm
Exploitability: Unknown Security Impact: High
As reported in [1], EncFS uses a stream cipher mode to encrypt the last file block. The change log says that the ability to add random bytes to a block was added as a workaround for this issue. However, it does not solve the problem, and is not enabled by default.
EncFS needs to use a block mode to encrypt the last block.
EncFS's stream encryption is unorthodox:
This should be removed and replaced with something more standard. As far as I can see, this provides no useful security benefit, however, it is relied upon to prevent the attacks in [1]. This is security by obscurity.
Edit : [1] may be unavailable, so here it is from archives.org :