Closed jon-chuang closed 4 years ago
This is certainly an important point in the solution space for federated learning (see, for instance, http://papers.nips.cc/paper/7984-cpsgd-communication-efficient-and-differentially-private-distributed-sgd.pdf), where such a scheme would enable the training of very large models than what federated learning would presumably allow (due to naively, several GB per single weight update communication cost per user, though it seems efforts are into cutting down, I don't know the SOTA) from the encrypted data of many users.
The problem of course is that the server would have be holding N copies of keys in the server memory, or perform communication with disk. So it is more suited to the scenario when the number of users is still small, due to the need to store KSKs in memory.
If data is accumulated and processed in a batched fashion, this could still be of great use. Of course, the batches here would not be real minibatches, they are rather minibatches of clusters of data following their own distribution (corresponding to a single data provider). However, communication could naively still be better than in the federated learning case, since the communication of KSKs from disk to device memory could be favourable to that of gradients over internet.
KSK sizes go as l^2 in the levels due to depth. Gradient sizes go as l. So there should be a nice tradeoff in the middle. Hopefully this tradeoff would favour ksk sizes for the sizes of models we care about.
It would seem, however, that such a thing could be conceivable in devices with remote attestation, such as the probably oncoming generation of remotely attestable accelerators. Then again, this may also defeat the purpose of HE itself, due to the computational overhead. (see: FPGA https://eprint.iacr.org/2019/405.pdf, protocol: https://saardrimer.com/sd410/papers/remoteupdates.pdf, GPU https://www.microsoft.com/en-us/research/uploads/prod/2018/09/Graviton-Trusted-Execution-Environments-on-GPUs.pdf)
For instance, such a scheme would be conceivable if after a fixed sequence of arithmetic operations, the pseudorandom OTP s.a would sum to something small (same order as error e). For instance, if we are taking a sum across n ciphertexts, we require n(s.a) ~= e. It remains to be proven that under such constraints, we recover the original properties of RLWE.
Perhaps @haochenuw might be able to comment
Presumably, this is not a big deal, as it is just another linear constraint in the LWE problem, but I am not so sure, I need to remember the types of attacks that exist.
But it could be worse, for instance n(s.a) = 0 for coefficients with index larger than k means n | said coefficients. If n large, that leaves very few coefficients of the one time pad to search over (dunno if this affects the security of the secret key). This is very bad. But if n small, we may not get differential privacy.
I wonder if there is a way to combine multi-party computation with HE to achieve what I am thinking about. For instance to sum gradients while still encrypted, and only be able to decrypt once all party's gradients have been summed. This increases differential privacy guarantees.
This may involve somehow sharding a secret key across multiple parties.
I also need to work out what the equivalent constraints (linear or nonlinear) would be after passing through multiplication, relinearisation, or rescaling.
I also don't really know how to sample from a distribution with constraints.
It appears what I was looking for is a property satisfied by garbled circuits; I wonder if there is any meaningful relationship to what I want.
Actually, it seems that one gets this for free by publically publishing the decryption ct_1 . s + e
with a noise e
.
Since this decrypts nothing but the final result and nothing of the intermediate results, one could use this in principle to compute homomorphically gradients and then perform the decryption directly on the server. If the client also sends an encrypted noise to the gradient to achieve differential privacy that must be added before decrpytion, the server has no choice but to add this noise in order to decrypt correctly.
Note however that this requires the server to send half of the ciphertext back to the client. So this is just a step away from sending the full ciphertext (ct_0, ct_1)
. So it is not a very good result. Furthermore, sending the noise also costs communication. Actually, this is wrong since one will not know what computation exactly the server has computed. Hence, one may as well be decrypting the original ciphertext with a zero ciphertext added to it.
One wonders whether the decryption can be performed in a secure manner (not revealing the secret key) with high-assurance without communicating with the client, that does not rely on the trust of TEE or on bootstrapping. (the answer is yes, and I am writing a paper to show this.)
Actually, computational decryption is impossible primarily because randomness introduced in the form of ct_1 is required to achieve CPA-IND security, sampling the same ct_1 twice with high enough probability would result in an attacker being able to identify the difference between two message vectors m-m'
assuming m << q
, by an algorithm that runs in n^2, where n is the average number of ct_1 sampling collisions, by subtracting two ciphertexts and seeing if the result is short.
Furthermore, the computations to calculate ct_1 also cannot be performed in advance because ciphertext mult entangles ct_1 with the value of the message.
Hence RLWE based HE is unavoidably 2-party interactive. There is no way to avoid the communication cost of, for instance, gradients, if performing gradient descent. This could be truly considerable for large models (100s of GBs)
Hence, methods to securely decrypt ciphertexts that ensure that decryption is not premature, while they are in the cloud, have some arguments for being developed.
Is it conceivable that there exists a HE scheme in which a computation can be decrypted by a server only after a predesignated series of operations have been performed?
Meaning to say, that somehow, the operations themselves become infused with a public key, such that their precise alignment leads to their being able to be decrypted under a public key as a plaintext? CPA security must be maintained for every operation prior to a final operation.
Under the current paradigm, we are interested in CPA security, but what if we want to relax it to a differentially private security, that also has an element of verifiability, in the form of verifying the server has performed the correct operations? One reveals precisely the information one would like to reveal.
For instance, this would be of use to training a deep learning model entirely on the cloud based on data from multiple parties without the communication overhead associated with multiparty computation, as the gradients can be averaged, masked with a noise and then decrypted directly in the server. These gradients can be combined with gradients from other data sources and update a central learner.