milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.14k stars 2.71k forks source link

[Feature]: Encryption of data at rest #33810

Open marksl opened 3 weeks ago

marksl commented 3 weeks ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Based on a recent paper "Text Embeddings Reveal (Almost) As Much As Text". https://arxiv.org/pdf/2310.06816 and https://thegradient.pub/text-embedding-inversion/ it is possible to reveal most of the original text from some text embeddings. Use cases that involve sensitive data may require encrypting vectorized data at rest.

Describe the solution you'd like.

Is it feasible to add AES symmetric-key encryption at rest to Milvus? Specifically adding encryption at rest for data stored in the Object Store, Query Node and possibly Index Node?

A single key per database would be a good start.

If possible, it would be ideal to have tenant-specific keys. Ideally these keys could be mapped to a database, collection, partition or partition-key depending on the multi tenancy strategy used from https://milvus.io/docs/multi_tenancy.md . Given the dynamic nature of a partition-key based approach, maybe a default encryption key would be required.

Generally, I think it would work well if the keys could be injected on application startup. For Kubernetes these could be injected using Kubernetes Secrets. Some mechanism for rotating keys would be nice. I haven't thought that far ahead.

Describe an alternate solution.

Direction integration with Hardware Security Modules.

Anything else? (Additional Context)

There was some discussion of encryption at rest here -> https://github.com/milvus-io/milvus/discussions/29326

xiaofan-luan commented 3 weeks ago

milvus already support tls.

guess that is what you are looking for https://milvus.io/docs/tls.md#Set-up-a-Milvus-server-with-TLS

marksl commented 2 weeks ago

That is not quite what we are looking for - that nicely covers encryption in transit. I'm looking for something more similar to Transparent Database Encryption Oracle, SQL Server, MySQL, CockroachDB and Postgres/pgCrypto.

xiaofan-luan commented 2 weeks ago

so you mean encrypt data on kafka and S3? easiest way for now is you enable S3 and Kakfa encrption, or use Zilliz cloud as alternative?