taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

Possible to create deterministic encryption? #161

Closed brancusi closed 11 months ago

brancusi commented 11 months ago

Hi, I have a semi-sensitive bit of data. Wondering if there is a way to do a deterministic encryption so that I can do a lookup in the DB. I know there is a tradeoff here for security. If you have other techniques that would be great as well.

Thank you for the fantastic library.

ptaoussanis commented 11 months ago

@brancusi Hi Aram!

Thank you for the fantastic library.

You're very welcome, thanks for saying so :-)

Wondering if there is a way to do a deterministic encryption so that I can do a lookup in the DB. I know there is a tradeoff here for security. If you have other techniques that would be great as well.

By "deterministic encryption", you mean that you'd like the same input data to produce the same encrypted output every time?

There's currently no built-in way to do this with Nippy, though in principle it might be possible if you work at a lower level (e.g. to use a fixed initialization vector during symmetric encryption).

But before considering options down that road, I'd suggest first stepping back and being clear on exactly what the ultimate objective is. What exactly do you mean by "do a lookup in the DB"?

There might be other easier and/or safer ways of achieving the same result - e.g. via a hash or HMAC, etc.

BTW just a heads-up that I'm intending to deprecate Nippy's encryption features in the near future in favour of a dedicated library that I'm hoping to release this month and that offers a superset of the crypto functionality currently in Nippy.

The documentation is already up, so you can see what that library will cover.

brancusi commented 11 months ago

Thank you for all these questions. Yes exactly.

I am storing emails in Postgres and I'm encrypting almost all parts of the message.

The issue I have is I need to search emails by subject and the idea was to encrypt and then pass that encrypted data to the where clause.

Any ideas?

Thanks for the heads up on deprecating encryption in Nippy. Tempel looks great. Will definitely switch over.

ptaoussanis commented 11 months ago

So you want to store emails with their content and metadata (including subject) encrypted, while still being able to search by subject?

the idea was to encrypt and then pass that encrypted data to the where clause.

One problem here is that even if you have deterministic encryption, the search query would need to exactly match the original subject line. I'm not sure what your specific circumstances are, but I'm guessing that might not be feasible?

Any ideas?

The first idea that comes to mind is something like this:

I.e. maintain your own metadata index that'll refer to email uuids.

Then when a user wants to do a search:

  1. Decrypt the metadata index
  2. Perform the search to find the matching uuid/s
  3. Fetch the corresponding row/s
  4. Decrypt those as necessary

Etc.

Though details would depend on exactly what you're trying to do, what constraints you have, etc.

For example if you're dealing with a lot of emails, maintaining a single metadata index may become a prohibitive bottleneck - in which case you might want some kind of shard on the index by date, etc.

Again, details would depend on your situation so this would need some design.

Hope that's of some help?

brancusi commented 11 months ago

Thank you, this makes great sense. As for subject search needing to be exact, yes it's a good point and I was definitely thinking how I could get around this, but I could get away with an exact match for now.

I like the index approach. I think that's how I will implement it.

Thanks again for all the feedback.

ptaoussanis commented 11 months ago

You're very welcome, best of luck!