linagora / james-project

Mirror of Apache James Project
Apache License 2.0
72 stars 62 forks source link

[EPIC] S3: support SSE-C #5262

Open chibenwa opened 2 months ago

chibenwa commented 2 months ago

Why

Protect data in the object store: if data is read then no data leak appears thanks to encryption.

Today James applies himself AES for a single private key.

Limitation:

Proposal

Enabling SSE-C

Allow to use https://help.ovhcloud.com/csm/en-public-cloud-storage-s3-encrypt-objects-sse-c?id=kb_article_view&sysparm_article=KB0047314 (client provided keys)

(Meaning we can either do AES james side OR do SSE-C OR do nothing)

Configuration blob.properties:

encryption.s3.sse.c.enable=true
encryption.s3.sse.c.master.key.algorithm=AES256
encryption.s3.sse.c.master.key.salt=XXXXXXXXXX
encryption.s3.sse.c.master.key.password=XXXXXXXXXX

Within blob-s3 maven project, reuse dependecy blob-aes and use PBKDF2StreamingAeadFactory to derive the new key.

Integration with the S3 driver is straight forward:

Screenshot from 2024-08-30 11-19-42

Definition of done:

{
    "Version": "2012-10-17",
    "Id": "PutObjectPolicy",
    "Statement": [
        {
            "Sid": "RestrictSSECObjectUploads",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*",
            "Condition": {
                "Null": {
                    "s3:x-amz-server-side-encryption-customer-algorithm": "false"
                }
            }
        }
    ]
}   

(and document the put policy as well!)

Option for key derivation

Allows using an encryption key per blob (derived from the blobId) hence making each encryption key unique: if I break one key, I decrypt only one object, not all.

Configuration:

encryption.s3.sse.c.key.derivation.enabled=true

Investigate base way to derivate the key:

Notes

Preliminary to a stronger isolation policy where tenant get encoded in the key derivation salt. This would be handy for Postgres multi-tenancy.

vttranlina commented 2 months ago

About key derivation

Some code from Chat GPT (not test)

import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.PBEKeySpec;
import java.security.SecureRandom;
import java.util.Base64;

public class KeyDerivationUtil {

    // Derive a key from the master key and blobId
    public static String deriveKey(String masterKey, String blobId) throws Exception {
        char[] masterKeyChars = masterKey.toCharArray();
        byte[] salt = blobId.getBytes(); // BlobId is used as the salt

        // Using PBKDF2 with HMAC-SHA-256
        PBEKeySpec spec = new PBEKeySpec(masterKeyChars, salt, 65536, 256); // 65536 iterations, 256-bit key
        SecretKeyFactory skf = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA256");
        byte[] derivedKey = skf.generateSecret(spec).getEncoded();

        // Encode the derived key to Base64 for use in requests
        return Base64.getEncoder().encodeToString(derivedKey);
    }
}

then upload:

 PutObjectRequest putRequest = PutObjectRequest.builder()
                .bucket(bucketName)
                .key(objectKey)
                .serverSideEncryptionCustomerAlgorithm("AES256")  // Use AES256 for SSE-C
                .serverSideEncryptionCustomerKey(derivedKey)      // Provide the derived key for encryption
                .build();

download:

 GetObjectRequest getRequest = GetObjectRequest.builder()
                .bucket(bucketName)
                .key(objectKey)
                .serverSideEncryptionCustomerAlgorithm("AES256")  // Use AES256 for SSE-C
                .serverSideEncryptionCustomerKey(derivedKey)      // Provide the derived key for decryption
                .build();
vttranlina commented 2 months ago

Note: blobId is not a only input for salt It can be combine: domain (tenant) + blobId when multi-tenancy.mode=ssec https://github.com/linagora/james-project/issues/5263

chibenwa commented 2 months ago

Sure using the blobId as salt within PBKDF2WithHmacSHA256 is gold but what's the cost of it?

Ideally we shall have JMH micro benchmarks on key derivation, and in terms of perfs it shall be well inferior to ms

Note: blobId is not a only input for salt It can be combine: domain (tenant) + blobId when multi-tenancy.mode=ssec https://github.com/linagora/james-project/issues/5263

Yep yep yep!

Arsnael commented 2 months ago

Confusion in the team regarding encryption.s3.sse.c.master.key.salt=XXXXXXXXXX

Different opinions here. Is it used all the time?

Some of us think if key derivation disabled not used but only when key derivation enabled (salt + blobId for example)

but others think it is used : salt for all blobs if no key derivation, salt + blobId (or sth else) if enabled.

What's the correct way here?

chibenwa commented 2 months ago

but others think it is used : salt for all blobs if no key derivation, salt + blobId (or sth else) if enabled.

IMO that one.

But i think we shall experiment with key derivation further... (JMH benchmarks...)

Ideally:

IMO experiment a bit and see what can be done on this topic...

Arsnael commented 2 months ago

Task list (feel free to comment):

vttranlina commented 2 months ago

When I tried to lab SSEC with S3Minio, I got an error:

software.amazon.awssdk.services.s3.model.S3Exception: Requests specifying Server Side Encryption with Customer provided keys must be made over a secure connection. 

It require "SSL" connection,

quantranhong1999 commented 2 months ago

It require "SSL" connection,

Does our S3 client support SSL mode?

Arsnael commented 2 months ago

You really think ovh exposes their S3 endpoints in http and not https? Come on guys... Check again all the s3 endpoints in our deployments and you will see everything is in https

vttranlina commented 2 months ago

I know ssl is used for prod, My mean raise stuck with test container (S3minio container)

Arsnael commented 2 months ago

Try to add a nginx locally maybe for ssl?

EDIT: or I'm sure we should be able to setup ssl for minio

hungphan227 commented 2 months ago

I know ssl is used for prod, My mean raise stuck with test container (S3minio container)

If ssl is required, then setup ssl for it

hungphan227 commented 2 months ago

https://min.io/docs/minio/linux/operations/network-encryption.html

chibenwa commented 2 months ago

https://github.com/linagora/socat-client-server to easily to the job...

vttranlina commented 2 months ago

Sure using the blobId as salt within PBKDF2WithHmacSHA256 is gold but what's the cost of it?

Bellow is my Benchmark test the generate deriveKey by using jmh-core

  1. 
    ALGORITHM: PBKDF2WithHmacSHA256
    ITERATION_COUNT: 65536
    KEY_LENGTH: 256

Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 103.055 ± 0.710 ops/s


2. Decrease ITERATION_COUNT 65536 -> 10000
The `ops/s` is significantly better.

ALGORITHM: PBKDF2WithHmacSHA256 ITERATION_COUNT: 10000 KEY_LENGTH: 256

Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 671.076 ± 14.413 ops/s


3. Decrease `KEY_LENGTH` 256->128
The `ops/s` has not changed significantly.

ALGORITHM: PBKDF2WithHmacSHA256 ITERATION_COUNT: 65536 KEY_LENGTH: 128 Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 102.869 ± 0.274 ops/s


4. Change `ALGORITHM` PBKDF2WithHmacSHA256 -> PBKDF2WithHmacSHA512 
the `ops/s` is significantly slower.

ALGORITHM: PBKDF2WithHmacSHA512 ITERATION_COUNT: 65536 KEY_LENGTH: 256

Benchmark Mode Cnt Score Error Units DeriveKeyBenchmark.testDeriveKey thrpt 5 27.564 ± 0.292 ops/s



// Code lab: https://gist.github.com/vttranlina/c856f0f1ef5ac8c66708396cd7754f19

One sample result 
[DeriveKeyBenchmark_PBKDF2WithHmacSHA256_65536_256.txt](https://github.com/user-attachments/files/17128056/DeriveKeyBenchmark_PBKDF2WithHmacSHA256_65536_256.txt)
chibenwa commented 2 months ago

Thanks for putting the JMH benchmarks together.

I discussed with Xavier and we think that derivation with PBKDF2WithHmacSHA256 is ok IF we use a cache (Guava loading cache) so that AES key is computed only once per domain.

Commenting on the related ticket BTW...