milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31k stars 2.95k forks source link

[Feature]: Encryption of data at rest #33810

Open marksl opened 5 months ago

marksl commented 5 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Based on a recent paper "Text Embeddings Reveal (Almost) As Much As Text". https://arxiv.org/pdf/2310.06816 and https://thegradient.pub/text-embedding-inversion/ it is possible to reveal most of the original text from some text embeddings. Use cases that involve sensitive data may require encrypting vectorized data at rest.

Describe the solution you'd like.

Is it feasible to add AES symmetric-key encryption at rest to Milvus? Specifically adding encryption at rest for data stored in the Object Store, Query Node and possibly Index Node?

A single key per database would be a good start.

If possible, it would be ideal to have tenant-specific keys. Ideally these keys could be mapped to a database, collection, partition or partition-key depending on the multi tenancy strategy used from https://milvus.io/docs/multi_tenancy.md . Given the dynamic nature of a partition-key based approach, maybe a default encryption key would be required.

Generally, I think it would work well if the keys could be injected on application startup. For Kubernetes these could be injected using Kubernetes Secrets. Some mechanism for rotating keys would be nice. I haven't thought that far ahead.

Describe an alternate solution.

Direction integration with Hardware Security Modules.

Anything else? (Additional Context)

There was some discussion of encryption at rest here -> https://github.com/milvus-io/milvus/discussions/29326

xiaofan-luan commented 5 months ago

milvus already support tls.

guess that is what you are looking for https://milvus.io/docs/tls.md#Set-up-a-Milvus-server-with-TLS

marksl commented 5 months ago

That is not quite what we are looking for - that nicely covers encryption in transit. I'm looking for something more similar to Transparent Database Encryption Oracle, SQL Server, MySQL, CockroachDB and Postgres/pgCrypto.

xiaofan-luan commented 5 months ago

so you mean encrypt data on kafka and S3? easiest way for now is you enable S3 and Kakfa encrption, or use Zilliz cloud as alternative?

marksl commented 4 months ago

Yes! I mean encrypt the data in kafka and S3. Pulsar End-to-End encryption https://pulsar.apache.org/docs/3.3.x/security-encryption/ would possibly work for us if we can inject keys somehow. S3 SSE-C could also potentially work if there's a way to inject keys as well.

Are there any details on "Storage Hook Support for Bring Your Own Key (BYOK) encryption" under Milvus 2.5.0 on https://milvus.io/docs/roadmap.md . That sounds potentially what i'm looking to do.

xiaofan-luan commented 4 months ago

Yes! I mean encrypt the data in kafka and S3. Pulsar End-to-End encryption https://pulsar.apache.org/docs/3.3.x/security-encryption/ would possibly work for us if we can inject keys somehow. S3 SSE-C could also potentially work if there's a way to inject keys as well.

Are there any details on "Storage Hook Support for Bring Your Own Key (BYOK) encryption" under Milvus 2.5.0 on https://milvus.io/docs/roadmap.md . That sounds potentially what i'm looking to do.

I thought salesforce team has been working on this feature for a while. I will double check with their progress.