Related scientific articles

storojs72 commented 4 years ago

Can you please provide a links to scientific articles on searchable encryption topic that you studied and used for CipherSweet creating? What theoretic SSE scheme inspired the most?

paragonie-scott commented 4 years ago

CipherSweet didn't spawn from academia so you won't find much in the way of scientific literature describing its design. In fact, you could say that its design was a result of the nigh-universal failure of the searchable symmetric encryption scheme designers to address real-world cryptography concerns (especially order-revealing encryption and order-preserving encryption).

That being said, I do intend to write a paper this year to describe the scheme formally (and include security proofs).

Design Constraints

When CipherSweet was being designed, there were two main concerns:

Don't leak data (like OPE/ORE, as linked above).
No new cryptography primitives (i.e. no lattices, should work out of the box with OpenSSL and/or libsodium).

Background and History

The earliest suggestion for indexing encrypted data that we could find is a 2006 blog post by Raul Gaurcia, a Senior Security Program Manager for Microsoft. This shows up in Stack Overflow as a solution to this sort of problem.

Their approach was simply:

Encrypt in one column (e.g. AES)
HMAC the plaintext into an additional column

This naive design turns out to be secure against a naive attacker.

AES is industry standard for symmetric encryption. (Just please use an AEAD mode, not ECB.)
Cryptographically secure hash functions, such as SHA2 and BLAKE2, must be (at minimum) secure in the random oracle model. (They just aren't ideal for password hashes, you want something like Argon2 instead.)
The HMAC construction turns a secure hash function and turns it into a symmetric-key authenticator.
Consequently, HMAC-SHA2 under an unknown key turns out to be a good PRF (albeit not as performant as e.g. SipHash).

Consequently, if you're given a table of AES ciphertexts and HMAC hashes under two separate, unknown keys, the most you can really do is detect duplicate plaintexts.

Depending on your choice of cipher and hash function, you may discover other weaknesses become relevant if you use the same key in both contexts, but really, the "leaks duplicate rows" is a problem in and of itself. It's also very limited in its utility: You can only index literal plaintext values with their scheme.

Aside: Before CipherSweet was formally designed, we described approximately Raul Garcia's design for a Security B-Sides talk titled Building Defensible Solutions to Weird Problems. The source code and slides as well as a recording of that talk are both available online.

Although initially described as a "weird" problem, the demand for a solution turns out to be pretty high. A lot of companies want this problem solved, so it's not really fair to call it "weird" after all.

What Exactly is New?

The innovation of CipherSweet exists in two main components:

Recognizing that a truncated PRF can be turned into a secure Bloom filter.
- This is probably not the most revolutionary idea in computer science, but it wasn't inspired by any particular scientific paper, so it's difficult to suggest one to cite.
Offering functional indexes of the plaintexts instead of merely lookups of literal plaintext values of a single field.
- Corollary: Compound indexes were created to address situations where a SELECT query against a boolean field might be necessary for business logic (i.e. patients' HIV status in a medical database).

Those are the only conceptual departures (aside from wrapping this into easy-to-use PHP and Node.js libraries) from the rough design proposed by Raul Garcia's blog post (and informally described by various security engineers and cryptographers over the years).

Security Proof and Academic Paper When?

To be honest, it's not my highest priority right now.

Between Gossamer and two other projects that will be made public in October, as well as paid client work, I've got a full plate.

When I have more available bandwidth, however, it will get done.

I hope this helps!

storojs72 commented 4 years ago

Thank you for the explanation.

Actually, we were really inspired on the CipherSweet and used your aproach in our scheme (but in our design we use proxy). We have published the paper on IACR (https://eprint.iacr.org/2019/806.pdf) with the scheme description.

Now, I (as a paper co-author), understand that academia people wouldn't pay attention on any practical scheme if it is not formally described and has not mathematical proofs of security.

If you don't mind, I would like to contribute to the CipherSweet project with those formal scheme description and security proofs (I'm not very expierenced in math and formal crypto, but this task is very interesting for me).

Regards

storojs72 commented 4 years ago

@paragonie-scott, can you please look at this. Static Symmetric Searchable Encryption in SQL Databases.pdf

I took a formal notation from https://eprint.iacr.org/2006/210.pdf which is considered a strong work in academic searchable encryption

paragonie / ciphersweet