Open chibenwa opened 2 months ago
About key derivation
Some code from Chat GPT (not test)
import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.PBEKeySpec;
import java.security.SecureRandom;
import java.util.Base64;
public class KeyDerivationUtil {
// Derive a key from the master key and blobId
public static String deriveKey(String masterKey, String blobId) throws Exception {
char[] masterKeyChars = masterKey.toCharArray();
byte[] salt = blobId.getBytes(); // BlobId is used as the salt
// Using PBKDF2 with HMAC-SHA-256
PBEKeySpec spec = new PBEKeySpec(masterKeyChars, salt, 65536, 256); // 65536 iterations, 256-bit key
SecretKeyFactory skf = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA256");
byte[] derivedKey = skf.generateSecret(spec).getEncoded();
// Encode the derived key to Base64 for use in requests
return Base64.getEncoder().encodeToString(derivedKey);
}
}
then upload:
PutObjectRequest putRequest = PutObjectRequest.builder()
.bucket(bucketName)
.key(objectKey)
.serverSideEncryptionCustomerAlgorithm("AES256") // Use AES256 for SSE-C
.serverSideEncryptionCustomerKey(derivedKey) // Provide the derived key for encryption
.build();
download:
GetObjectRequest getRequest = GetObjectRequest.builder()
.bucket(bucketName)
.key(objectKey)
.serverSideEncryptionCustomerAlgorithm("AES256") // Use AES256 for SSE-C
.serverSideEncryptionCustomerKey(derivedKey) // Provide the derived key for decryption
.build();
Note: blobId
is not a only input for salt
It can be combine: domain
(tenant) + blobId
when multi-tenancy.mode=ssec
https://github.com/linagora/james-project/issues/5263
Sure using the blobId as salt within PBKDF2WithHmacSHA256 is gold but what's the cost of it?
Ideally we shall have JMH micro benchmarks on key derivation, and in terms of perfs it shall be well inferior to ms
Note: blobId is not a only input for salt It can be combine: domain (tenant) + blobId when multi-tenancy.mode=ssec https://github.com/linagora/james-project/issues/5263
Yep yep yep!
Confusion in the team regarding encryption.s3.sse.c.master.key.salt=XXXXXXXXXX
Different opinions here. Is it used all the time?
Some of us think if key derivation disabled not used but only when key derivation enabled (salt + blobId for example)
but others think it is used : salt for all blobs if no key derivation, salt + blobId (or sth else) if enabled.
What's the correct way here?
but others think it is used : salt for all blobs if no key derivation, salt + blobId (or sth else) if enabled.
IMO that one.
But i think we shall experiment with key derivation further... (JMH benchmarks...)
Ideally:
IMO experiment a bit and see what can be done on this topic...
When I tried to lab SSEC with S3Minio, I got an error:
software.amazon.awssdk.services.s3.model.S3Exception: Requests specifying Server Side Encryption with Customer provided keys must be made over a secure connection.
It require "SSL" connection,
It require "SSL" connection,
Does our S3 client support SSL mode?
You really think ovh exposes their S3 endpoints in http and not https? Come on guys... Check again all the s3 endpoints in our deployments and you will see everything is in https
I know ssl is used for prod, My mean raise stuck with test container (S3minio container)
Try to add a nginx locally maybe for ssl?
EDIT: or I'm sure we should be able to setup ssl for minio
I know ssl is used for prod, My mean raise stuck with test container (S3minio container)
If ssl is required, then setup ssl for it
https://github.com/linagora/socat-client-server to easily to the job...
Sure using the blobId as salt within PBKDF2WithHmacSHA256 is gold but what's the cost of it?
Bellow is my Benchmark test the generate deriveKey
by using jmh-core
ALGORITHM: PBKDF2WithHmacSHA256
ITERATION_COUNT: 65536
KEY_LENGTH: 256
Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 103.055 ± 0.710 ops/s
2. Decrease ITERATION_COUNT 65536 -> 10000
The `ops/s` is significantly better.
ALGORITHM: PBKDF2WithHmacSHA256 ITERATION_COUNT: 10000 KEY_LENGTH: 256
Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 671.076 ± 14.413 ops/s
3. Decrease `KEY_LENGTH` 256->128
The `ops/s` has not changed significantly.
ALGORITHM: PBKDF2WithHmacSHA256 ITERATION_COUNT: 65536 KEY_LENGTH: 128 Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 102.869 ± 0.274 ops/s
4. Change `ALGORITHM` PBKDF2WithHmacSHA256 -> PBKDF2WithHmacSHA512
the `ops/s` is significantly slower.
ALGORITHM: PBKDF2WithHmacSHA512 ITERATION_COUNT: 65536 KEY_LENGTH: 256
Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 27.564 ± 0.292 ops/s
// Code lab: https://gist.github.com/vttranlina/c856f0f1ef5ac8c66708396cd7754f19
One sample result
[DeriveKeyBenchmark_PBKDF2WithHmacSHA256_65536_256.txt](https://github.com/user-attachments/files/17128056/DeriveKeyBenchmark_PBKDF2WithHmacSHA256_65536_256.txt)
Thanks for putting the JMH benchmarks together.
I discussed with Xavier and we think that derivation with PBKDF2WithHmacSHA256 is ok IF we use a cache (Guava loading cache) so that AES key is computed only once per domain.
Commenting on the related ticket BTW...
Why
Protect data in the object store: if data is read then no data leak appears thanks to encryption.
Today James applies himself AES for a single private key.
Limitation:
Proposal
Enabling SSE-C
Allow to use https://help.ovhcloud.com/csm/en-public-cloud-storage-s3-encrypt-objects-sse-c?id=kb_article_view&sysparm_article=KB0047314 (client provided keys)
(Meaning we can either do AES james side OR do SSE-C OR do nothing)
Configuration
blob.properties
:Within
blob-s3
maven project, reuse dependecyblob-aes
and usePBKDF2StreamingAeadFactory
to derive the new key.Integration with the S3 driver is straight forward:
Definition of done:
blob-eas
data with sse-c (are the two inter-operable? Likely not but worth trying!(and document the put policy as well!)
Option for key derivation
Allows using an encryption key per blob (derived from the blobId) hence making each encryption key unique: if I break one key, I decrypt only one object, not all.
Configuration:
Investigate base way to derivate the key:
Grooming will identify best way to derivate the key
Key derivation will be performed on each blob so this operation NEEDS to be cheap.
Notes
Preliminary to a stronger isolation policy where tenant get encoded in the key derivation salt. This would be handy for Postgres multi-tenancy.