Open nskobelevs opened 2 years ago
Hey @nskobelevs, thanks for contributing to Blockcrypt.
The headers and messages are encoded in the way they are provided. This can cause an issue if the "real" message is the first one.
Can this be solved by making it obvious to user that secret 1 should not be hidden secret? This is how Blockcrypt is implemented in Superbacked hence why I didn’t give this use case much thought.
That said, I like the idea of shuffling… will spend some time thinking about your recommendation.
Can this be solved by making it obvious to user that secret 1 should not be hidden secret?
While that's an option I don't think it's a foolproof one. I'm approaching this from a general perspective specifically to Blockcrypt and it's spec. Imo a user should be able to treat the algorithm as a black box with no insights on how it works and still have plausible deniability in every situation.
It's possible a user won't read that warning or some other implementation might completely omit it - given we don't actually care about the order of the headers or the data shuffling seems the best way prevent this situation.
Something I've also realised just now: When encryption happens, each message it just appended in order to the data block. Even after shuffling this might be an issue because when we decrypt the header we know where the data the encypted message is - if this message doesn't start at 0 it can expose that there are other messages before it.
How about also:
While that's an option I don't think it's a foolproof one.
Do you agree that if secret 1 is not a hidden secret, plausible deniability is preserved?
Trying to keep block size as small as possible while adding plausible deniability by design… this is why block size is derived deterministically from size of secret 1 (using Blockcrypt’s public domain spec) .
I really like the idea of shuffling and moving padding around but not sure how effective it would be and how block size would be chosen? If not derived deterministically from secret 1 (which could be known by adversary forcing user to decipher block), wouldn’t block size reveal a lot about hidden secrets?
Blockcrypt is inspired by VeraCrypt which is also opinionated… a regular volume is created and a hidden volume is created within.
That said, super open to exploring less opinionated schemes as long as they are optimized for small block size so one can print block as QR code.
Do you agree that if secret 1 is not a hidden secret, plausible deniability is preserved?
I think it is but you're possibly still leaving room for error.
[...] and how block size would be chosen? If not derived deterministically from secret 1 (which could be known by adversary forcing user to decipher block), wouldn’t block size reveal a lot about hidden secrets?
You could still use the first secret passed in to get the blocksize it just wouldn't be encoded in the first header but yeah it still allows the attacker to make an educated guess of whether it's the real secret or not depending whether the message size would leave to that block size - although that is already an issue as mentioned above.
[...] as they are optimized for small block size so one can print block as QR code.
Really dig the concept of physical backups, great idea!
It seems currently the limit for secret 1 is around 960 chars after which the default block size results in JSON larger than the max QR code can accept. Not bad overall but they do get pretty big after a bit.
I wonder if it's possible to improve the storage efficiency using a different encoding? Maybe no JSON? QR codes have different capacities depending if it's arbitrary bytes, alphanumeric or numeric data Table (might be off a bit from my testing)
Actually, if the JSON is already using the binary QR code encoding, why stop at base64? I think I remember reading that basE91 is more efficient to base64.
I did a small test there and a 2648 base64 string ended up being 2441 in basE91.
Edit
Or even, if you don't care about human readability, can always just send the raw bytes instead of encoding them (the base64 string from above is 1984 raw bytes). Would likely have to abandon JSON at that stage though.
I really dig your mindset @nskobelevs… went down similar rabbit hole trying to find most efficient way to store data.
Ended up choosing Base64 (in the context of Superbacked) given it is supported out-of-the-box on Node.js (therefore on Electron) and marginal increase in capacity of other encodings did not significantly improve user experience.
In the context of Superbacked, I am working on a scheme where user could drag and drop file on app… file would then be encrypted using 256-bit key (which would be stored using Blockcrypt) and encrypted file would be uploaded to the Superbacked Cloud using file checksum (SHA-256) as key on AWS/S3 so Superbacked would not need to know anything about user given 256-bit key space size.
That feature would remove the data size limit entirely while offering a great user experience with all the benefits of paper backups and plausible deniability (if hidden files are also uploaded).
I just found this issue after watching your video about Blockcrypt on YouTube as well. I was also concerned about the plausible deniability, as the hidden secret (not sure if I'm mixing up terms here, I mean the real secret I want to hide within the encrypted block of data) has to be the last one (or at least it can't be the first one beginning at position 0 if I make sure not to reveal any additional secrets that have been written after the hidden secret), hence it could also be random padding, as an adversary would probably threaten me to brake my fingers until he/she gets a password that decrypts the data beginning at position 0.
Just thinking aloud here: Would it be possible to define the block of data in a circular form (a circle of data cut at a random position to be written into a block), meaning that if an entry point and the length of encrypted data is defined within the header, the first part of the encrypted data could be written at the end of the overall Blockcrypt data block and if it doesn't fit the rest is written to the beginning starting at position 0. In this case it would be possible to just choose a random entry point for the first data block (D1) defined in the first header (H1) instead of always beginning at position 0, and the other data blocks (if present) could be appended in sequence. If I'm not mistaken, there's no need to shuffle around the order of data, as all data surrounding a revealed data block could be just random padding.
Please tell me if I'm missing something.
This is a pretty interesting project and while I am no researcher, after stumbling on your video about it was I curious and had a look!
I think there's currently two main issues that may break your plausible deniability goal in some situations.
1. Sequential Headers
The headers and messages are encoded in the way they are provided. This can cause an issue if the "real" message is the first one.
When decoding we know which header was decoded - if we decode and it wasn't header one, we know there are other messages hidden by all the headers prior to the one we decoded.
Possible Solution: Shuffle around the order of data and headers so that no assumptions can be made.
2. Data Length
The data length is defined "first secret ciphertext buffer length * 2 rounded to nearest upper increment of 64 bytes". This can be an issue because we can reverse this default data length to get a range for the length of the first message.
The data IV is always 12 bytes and the auth tag is 16 bytes. Given a data length, we can guess the length
l
of the first message to be in the following range:0.5 * (d - 120) < l <= 0.5 * (d - 56)
So lets say we have block with a data length of 256 and an unknown number of messages. Given the data length is 256 we can assume the length of the first message will be in the range
68 < l <= 100
.If the block is decoded and we get a message shorter than 68 bytes then we can assume this wasn't the first message in the block. This would still be clear from issue 1 since we wouldn't decode header 1, but even if that is fixed, we know there is at least one other message hidden with the expected size constraint.
Possible Solution: Honestly, not overly sure about this one.