unverbuggt commented 1 year ago

The plugin claims to "Encrypts text with AES-256".

There is comment saying: "# key must be 32 bytes for AES-256, so the password is hashed with md5 first", but MD5 hashes to 128bit (16 Bytes). We achieve the 32 bytes by hexdigest-ing (thus doubling the bytes).

We should use a hash function that gets us true 32 bytes, like SHA256.

Is th IV corrent with 16 bytes for AES-256?

On the other hand the block size is set to 32 bytes in CBC mode. This page says it's 16 Bytes for AES and there is "Crypto.Util.Padding to align the plaintext to the right boundary."

I haven't checked what crypto-js has to say, yet.

we really need to look into that...

unverbuggt commented 1 year ago

Found this. So the Block size is always 16 bytes, but we keep padding to 32 bytes and removing all padding characters.

What happens if the last character really was the padding character??? It probably gets stripped.

We should look into using automatic padding of PyCryptodome and crypto-js, if it is compatible.

CoinK0in commented 1 year ago

First, I'm not a cryptographer, so this answer may contain errors, although it's based on some experience (CTF) and research.

 There is comment saying: "# key must be 32 bytes for AES-256, so the password is hashed with md5 first",
 but MD5 hashes to 128bit (16 Bytes). We achieve the 32 bytes by hexdigest-ing (thus doubling the bytes).

 We should use a hash function that gets us true 32 bytes, like SHA256.

At first I didn't understand your true/false bytes story and was skeptical.

But after reflection and research you are right. MD5 does create a 32-character string as expected, but it limits the "charset" of this string to hexadecimal ([0-9a-f]), which has the effect of greatly reducing the entropy of a strong password.

This is not necessarily the case for weak passwords, since test has an entropy of ~5Bits, while the MD5 of 'test' 098f6bcd4621d373cade4e832627b4f6 has an entropy of ~130bits. But on strong password, ÿÊüp°DàæÜ½¤éÈÑñÀ\5Eè%|-#áÀ×ï5êå* having an entropy of ~400 bits, becomes with its MD5 9e5b473100090a53851764e69c3c590c with an entropy of 130 bits.

Using SHA256 to create 256-bit (32 byte) hexadecimal seems to be better but is not sufficient, since both MD5 and SHA256 are fast hashing algorithms, which do not protect against dictionary attacks. The best solution would therefore be to use slow hash functions to derive the key, such as PBKDF2, Argon2, Scrypt.

 On the other hand the block size is set to 32 bytes in CBC mode.
 This page says it's 16 Bytes for AES and there is "Crypto.Util.Padding to align the plaintext to the right boundary."

Yes, AES encryption always uses 128-bit fixed size blocks, whether for AES-128, AES-192, AES-256. What changes for these three versions is the length of the key used, which is respectively 128 bits (16 bytes), 192 bits (24 bytes), 256 bits (32 bytes).

And after reflection, it also seems that setting the block used in the padding to 32 bytes, will have the consequence of creating (in some cases) a block of 16 bytes of padding fully encrypted with the keys. Which if I understood correctly could lead to an "oracle padding" attack in certain cases. For this plugin this does not seem to be the case because the user does not control the generation of encrypted messages and therefore does not have a clear/encrypted version.

But as a matter of principle and to respect the RFCs, the "block_size" should be set to 16 bytes as expected.

Afterwards, to put things into perspective, despite the use of MD5 and an unconventional block size, the security of encrypted documents is not easily broken. We are talking about strong encryption here, but the basic one (md5) is also acceptable (my opinion)

I started to do a PoC with PBKDF2 to derive the key (Since it is included in the Crypto-JS lib), but for the moment I have problems my code doesn't seem to work.

unverbuggt commented 1 year ago

Hash function

The strength of encryption is mostly depending on the entropy of the password used. So the hash function just bloats the password to the desired keys size (256bits in our case), so using MD5 and hexlifing doesn't necessarily weaken security. But using 256 bits instead of 192 or 128 should make the cryptography stronger (I'm also no crypt expert).

But, as you were suggesting, we could improve security by making it harder to brute-force password by using a hashing function that takes more time to calculate (or uses more memory) and thus throttling the rate of passwords tried per second by some magnitudes.

Padding

For a test i do: self.__encrypt_text_aes__('This is a test.^^','TestPassword123') and receive (b'BXbbgiZ4CJ3NXruBLcuAAQ==', b'LM5x1Ytg+92YTBUOhN8JPV7db7V2CqixHaz7kgWwdEU=', b'^')

With "BXbbgiZ4CJ3NXruBLcuAAQ==" being the IV, which decodes to 16 bytes and "LM5x1Ytg+92YTBUOhN8JPV7db7V2CqixHaz7kgWwdEU=" being the cipher text which decodes to 32 bytes and last the padding character "^".

I intentionally used the padding character in the plain text to test if the removal of padding is working correctly.

Then I tried to decrypt in javascript with decrypt_content('TestPassword123','BXbbgiZ4CJ3NXruBLcuAAQ==','LM5x1Ytg+92YTBUOhN8JPV7db7V2CqixHaz7kgWwdEU=','^') and got "This is a test.", so the last two "^^" in the plain text got wrongfully removed.

We should use a padding function that already exists in pycryptodome and crypt-js. My understanding of pkcs7 from what I read here is that it adds the number of bytes missing until padding reached as repeated byte sequence.

unverbuggt commented 1 year ago

I've created a crypto-test branch where the hashing is changed to PBKDF2 and padding to Pkcs7.

So far I concluded:

Padding works just fine now.
But replacing the hashing function alone won't do the trick, and here is why:
- PBKDF2 has an iteration parameter which determines how much computing power is needed.
- It is advised to set it to 10e6 iterations (according to pycryptodome).
  - This adds a few more seconds to the MkDocs build time of my test configuration.
  - The browser however takes minutes to decrypt... this is not a great user experience (that's why I changed it to 10e4 for my machine).
    - on one hand, I think the processing power dedicated to javascript is throttled (no one wants to mine cryptocurrency for malicious websites).
    - on the other the PBKDF2 key generation is triggered for every single ciphertext (in my case the content, three encrypted_somethings and one search_index entry). This is done on every time a page is loaded.

Currently, the password is hashed with MD5 and then hexlified, so every password leads to exactly one key used for AES. So all content, encrypted_somethings, search_index entries are encrypted with the same key (but different IVs).

In order for this to be practical, I think we need a way to derive the salted key just once. I think of generating a random key to use with all to-encrypt plain texts of a page and encrypt this key with the key derived from PBKDF2 password and salt. This would also make it possible to define multiple passwords for one page and also creates the need for a password inventory of some sort.

I suggest these changes are too big for the 2.x branch and would lead some day to version 3.

unverbuggt / mkdocs-encryptcontent-plugin

AES-256 #47

Hash function

Padding