Stream Cipher Used to Encrypt Last File Block

lipnitsk commented 10 years ago

From: https://defuse.ca/audits/encfs.htm

Exploitability: Unknown Security Impact: High

As reported in [1], EncFS uses a stream cipher mode to encrypt the last file block. The change log says that the ability to add random bytes to a block was added as a workaround for this issue. However, it does not solve the problem, and is not enabled by default.

EncFS needs to use a block mode to encrypt the last block.

EncFS's stream encryption is unorthodox:

1. Run "Shuffle Bytes" on the plaintext.
    N[J+1] = Xor-Sum(i = 0 TO J) { P[i] }
    (N = "shuffled" plaintext value, P = plaintext)
2. Encrypt with (setIVec(IV), key) using CFB mode.
3. Run "Flip Bytes" on the ciphertext.
    This reverses bytes in 64-byte chunks.
4. Run "Shuffle Bytes" on the ciphertext.
5. Encrypt with (setIVec(IV + 1), key) using CFB mode.

Where setIVec(IV) = HMAC(globalIV || (IV), key), and,
    - 'globalIV' is an IV shared across the entire filesystem.
    - 'key' is the encryption key.

This should be removed and replaced with something more standard. As far as I can see, this provides no useful security benefit, however, it is relied upon to prevent the attacks in [1]. This is security by obscurity.

Edit : [1] may be unavailable, so here it is from archives.org :

[Full-disclosure] Multiple Vulnerabilities in EncFS
From: Micha Riser (micha[at]povworld.org)
Date: Thu Aug 26 2010 - 07:05:18 CDT
(...)
3. Last block with single byte is insecure 
------------------------------------------------------- 
The CFB cipher mode is insecure if it is used twice with the same 
initialization vector. In CFB, the ﬁrst block of the plain text is XOR-ed with 
the encrypted IV: 
  C0 = P0 XOR Ek (IV ) 
Therefore, for two cipher blocks C0 and C0' encrypted with the same IV, it 
holds that: 
  C0 XOR C0' = (P0 XOR Ek (IV )) XOR (P0' XOR Ek (IV )) = P0 XOR P0' 
This means that an attacker gets the XOR of the two plain texts. EncFs uses a 
modified version of CFB which additionally shuffles and reverses bytes. It is not 
clear however, if the modiﬁcations generally help against this problem. 

A security problem arises deﬁnitely if the last block contains only a single 
byte and an attacker has two versions of the last block. Operating on a single 
byte, the shuﬄe and reverse operation do nothing. What remains is a double 
encryption with CFB and XOR-ing the two cipher bytes gives the XOR of the 
two plain text bytes due to the reason described above. Encrypting the last 
block with a stream cipher instead of a block cipher saves at most 16 bytes 
(one cipher block). We think it would be better to sacriﬁce these bytes and in 
exchange rely only on a single encryption mode for all blocks which simpliﬁes 
both the crypto analysis and the implementation.

vgough commented 10 years ago

Plan is to eliminate use of stream mode entirely in Encfs 2.x (for new filesystems). No plan for Encfs 1.x

rfjakob commented 9 years ago

Do you already have a plan for what mode to use? CBC with ciphertext stealing seems to be a good option.

rfjakob commented 9 years ago

The other option would be to go with CTR for the whole file. With CTR, however, an attacker can flip single bits at will, so it would need to go with MAC enabled by default. If ecryptfs has MACs enabled by default (will check) we should probably too, anyway.

rfjakob commented 9 years ago

CTR has the additional problem that the XOR of two cipertext files copied at two different times is the XOR of the plaintext. To fix that leak you'd need random per-block IVs.

vgough commented 9 years ago

For Encfs2, I'm leaning towards GCM mode (as used in ZFS).

generalmanager commented 9 years ago

@vgough Salsa20+Poly1305 would also be a viable (and very fast) alternative, as outlined by Thomas Ptacek in his blog: http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

rfjakob commented 9 years ago

Actually, i don't think large changes like that are neccessary. Blockwise cbc works fine for everything but the last 16 bytes (the aes block size). By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

lachesis commented 9 years ago

Please don't invent a padding scheme; just pad with PKCS#7 like everyone else. :)

rfjakob commented 9 years ago

Thanks for the pointer! However, pkcs#7 seems to require that you read the last bytes of the ciphertext to geht the plaintext length. This is one additional seek for every stat(), we should really avoid that as it kills rsync performance.

rfjakob commented 9 years ago

(It's probably more than one seek, because the filesystem has to parse its internal data structures first to locate the data) So I think what we need is a "headerless" scheme, where you don't have to read any ciphertext to get the length. Unconditionally adding 16 zero bytes (or any value) would to that:

pppppppppp 0000000000000000
                    ^---- 16 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 0000000000
                     ^--- 10 bytes of zeros
     ^------------------- 16 bytes encrypted data

djtm commented 9 years ago

Isn't that a security issue if you know that the last bytes will be (padded with) zero bytes? Maybe better random bytes?

rfjakob commented 9 years ago

Nope, should be fine.

http://en.m.wikipedia.org/wiki/Known-plaintext_attack Modern ciphers such as Advanced Encryption Standard are not currently known to be susceptible to known-plaintext attacks.

RogerThiede commented 9 years ago

While the current modes of modern ciphers available to encfs might not currently be susceptible to known-plaintext attacks, these types of attacks are typical for cryptanalysis and so this assumption could change after further years of research.

Additionally, encfs offers multiple cipher options. Is this statement true for all ciphers encfs makes available through OpenSSL?

If given two choices for this implementation, are there impacts in choosing one over the other?

Pad with zeros
Pad with any value

akerl commented 9 years ago

Trying to predict how to modify ciphers based on what vulnerabilities might be discovered in the future quickly becomes a wild goose chase. I suspect if you submitted a PR that improved the padding without affecting backwards compat, it would fare better.

JanKanis commented 9 years ago

A random idea I just thought of: Encode file length (and other small useful metadata) in the encrypted filename. That would reduce the maximum filename length even more than it is now, so if that maximum is reached, substitute a hash of the filename and add the real file name to the end of the file data. That would encode metadata in the file contents only in the (rare) case where the filename is too long, so it wouldn't hurt rsync et al in the common case. And this would resolve the limited filename length problem as well.

vgough commented 9 years ago

In order to make lookups simple, it is preferable that encrypted filenames can be directly computed from plaintext filenames. That way a call to open("foo.txt") doesn't require a directory scan in order to find the encrypted file. Instead, we encrypt "foo.txt" and attempt to open the encrypted name.

Allowing hashed names, to extend allowable file lengths, doesn't hurt too badly since it could still be done without a directory traversal. Encoding metadata into filenames would thwart this, since I'm not aware of any portable way to do a prefix match or otherwise avoid walking the entire directory listing.

JanKanis commented 9 years ago

Of course. I should have thought it through a bit longer.

vgough commented 9 years ago

No worries, I appreciate the ideas. I've wanted to do the same myself, just didn't figure out a way to make that work.

wasgehetdichdasan commented 9 years ago

Is there a chance that there - maybe ;-) - will be a solution for the actual version in next time?

wasgehetdichdasan commented 9 years ago

no one who thinks that he can make a fast fix?

rfjakob commented 9 years ago

Well, this is an incompatible format change, there is no fast fix i'm afraid

wasgehetdichdasan commented 9 years ago

uhh. And what's with an not backwards compatible version which is not 2.0?

h0nIg commented 7 years ago

However, it does not solve the problem, and is not enabled by default.

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

ping @vgough

benrubson commented 7 years ago

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

When you configure encfs in expert mode :

Add random bytes to each block header?
This adds a performance penalty, but ensures that blocks
have different authentication codes.  Note that you can
have the same benefits by enabling per-file initialization
vectors, which does not come with as great of performance
penalty. 
Select a number of bytes, from 0 (no random bytes) to 8:

However, as the audit stated, it does not solve the problem.

danim7 commented 7 years ago

Any ideas on what data format shall be implemented?

A disk format based on GCM mode could also help to fix the issues related to MAC headers.

benrubson commented 6 years ago

This is the most "important" EncFS security report.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

I like the idea @rfjakob. Rather simple, without changing all existing work. Could be a transitional change, before moving to another whole format. Did you perhaps already / would you work on a patch for this please ?

We would then kill two birds with one stone, as we would then be able to also close #10 👍

rfjakob commented 6 years ago

No, sorry, no plans of working on this. People who don't mind a format change can move to gocryptfs IMO.

benrubson commented 6 years ago

I'll see if I can work on this later on :) I'm also wondering if such a modification would be reverse-write compatible.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

Actually I think that padding with 15 bytes should be OK ?

rfjakob commented 6 years ago

Right now I don't see why reverse-write would be a problem.

And yes, actually, 15 bytes should be enough.

benrubson commented 6 years ago

I then tried to implement the cipherBlockSize - 1 padding (15 bytes in examples below), according to what you described above @rfjakob, adding 15 bytes at the end of each file (but 0 byte files).

Leading to :

pppppppppp 000000000000000
                    ^---- 15 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

So, I have a working algorithm in the following situations :

normal mode : read and write ;
reverse mode : read ~~and write~~.

But reverse write seems impossible to achieve correctly. Below are some complicated situations. Let's assume block size is 4KB bytes.

We receive a 4KB write request which comes at the end of the existing backing file. It could then be the last block of the file, so we would have to crop the last 15 bytes after decryption. But we are not sure this is the last block to be written, so we are not sure we should crop...
We receive a 1020 bytes request which comes at the end of the existing backing file. If we assume this is the last block of the file, we should then crop 12 bytes, decrypt, crop 3 bytes and append. But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...
We receive a 10 bytes write request which comes at the end of the existing backing file. Is it some padding ? Should we then crop 5 bytes of the previous block ? Once again here we are not sure...

benrubson commented 6 years ago

I think I will go with OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block. We would then still be able to get size of files without having to read the last block. We would also be able to properly reverse-write, at a cost of one byte per blockSize. I think it's worth it.

Any thoughts ?

Thx 👍

Last block :

pppppppppp 100000000000000
                    ^---- 15 bytes OneAndZeroes padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

Other blocks :

ppppppppppppppp 1
                ^--------  1 byte OneAndZeroes padding
    ^-------------------- 15 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc
     ^------------------- 16 bytes encrypted data

rfjakob commented 6 years ago

Let's look at the difficulties, I think this should all work:

But we are not sure this is the last block to be written, so we are not sure we should crop...

Yes, we have to stat() the file to find out.

But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...

Again, we can stat() the file to determine if it is the last block. Forward mode has to do this as well, right?

rfjakob commented 6 years ago

Another note:

Perhaps calling application will come with another write call to complete the 1020 bytes already received...

This does not matter. In forward mode, the file has to be always consistent on disk. The user application may crash at any time and stop writing. But the data it has already written must be safe.

benrubson commented 6 years ago

Thx for your feedbacks @rfjakob 👍

I agree if the cipher file is fully available locally. You may be in a situation where the cipher file would not be locally available, so you would not be able to stat() it (so you would not be able to know if the block you have been asked to write is the last one of the file). Think about for example downloading (or syncing, whatever the method used) some remote cipher files directly into a reverse-mounted EncFS.

rfjakob commented 6 years ago

Forward mode would not work either in this case, right?

benrubson commented 6 years ago

It would, because then here you encode data, so you don't expect it to be a multiple of cipherBlockSize. If the block you are writing is at the end of the local (cipher) file, you assume this is the last block and compute a cipherBlockSize - 1 bytes padding.

rfjakob commented 6 years ago

I agree if the cipher file is fully available locally.

Can't we stat() the plaintext file instead?

benrubson commented 6 years ago

Unfortunately this would not help. Let's assume we receive a 4KB (blockSize) cipher block. According to the write call received, we have to write it as the end of the plaintext file. Perfect. It could then be the last block of the plain file. But how to be sure ? How can we then remove the last padding bytes that may exist ? Without padding every block as proposed above, I don't see :|

rfjakob commented 6 years ago

If the write expanded the file, if must be the last block, and it must have padding

rfjakob commented 6 years ago

(otherwise forward mode is buggy)

benrubson commented 6 years ago

Not necessarily. Think about a cipher file being dowloaded directly into a reverse-write EncFS (so that it is written decrypted directly to the local disk). Every block received and written will expand the plain file. But only the last one received (and written) will be the real last block of the plain file.

rfjakob commented 6 years ago

The every block must have padding.

benrubson commented 6 years ago

A 15 bytes padding ? Or a OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block ?

rfjakob commented 6 years ago

Yes, 15 bytes.

rfjakob commented 6 years ago

At that moment, it's the last block, right?

benrubson commented 6 years ago

Look at these use cases :

Backup : plain local -> EncFS reverse -> rsync to remote location

Restore : rsync from remote location -> EncFS reverse -> plain local

I'm not sure backup will need to insert a 15 bytes padding after every block.

rfjakob commented 6 years ago

Interesting use case, but there are other problems:

plain local -> EncFS reverse -> rsync to remote location -> ciphertext

Now, let's assume the ciphertext contains 1000. And rsync happens to write() a chunk of data that ends with 1000. What does

EncFS reverse -> plain local

do?

benrubson commented 6 years ago

// strange duplicate part of your message above deleted

Yes, I think this is the last tricky case. I already thought about this, and I think we need an additional internal buffer.

Let's take your example.

1000%16 = 8 We crop last last 8 bytes. We decode. We remove padding bytes if it looks like we can. We write plain data at the end of the plain file. We return that we wrote 1000 bytes. As 1000 < 4096, we keep the 1000 bytes into an internal buffer, as we may receive the next bytes of the block.

If we receive a write request with the next 1000 bytes, we will not read the 1000 previous bytes of the block from the plain file, as we have cropped some bytes, but will take them from our internal buffer.

rfjakob commented 6 years ago

I was curious if that use case really works, so I did:

a/zero -> reverse -> b/eNZPWSyw0rxU7T37UwNN3,n9  ----> cp
d/zero -> reverse -> c/eNZPWSyw0rxU7T37UwNN3,n9  <---/

And it seems wo work at first glance:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

So that was easy, lets overwrite the middle of the file with itself:

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123 seek=43 skip=43 count=1

Random garbage:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
a22fc0525129c3eb2fe1af2e4bc9fd5d  d/zero

rfjakob commented 6 years ago

However, this (note the odd block size):

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123

works, and I'm not sure why.

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

On decryption, we have to know if it is the last block, because the last block is handled differently. Where do we have this information from?

vgough / encfs

Stream Cipher Used to Encrypt Last File Block #9