[BGV]: add lazy relinearization

j2kun commented 6 months ago

Ideally this is done in a way that the optimization pass can be shared between all relinearizing ops across all schemes. In particular, we should implement a RelinearizeLike trait/interface, and implement the optimization pass against the interface.

ahmedshakill commented 3 months ago

Hi @j2kun I would like to work on this. Could you shade some light on how may I approach this. I have gathered some basic knowledge about bgv, openfhe and their conversion dialects. About lazy-ness: we want to delay the relinearization computation ( and also to find the scope to do so) ? How to introduce laziness to a computation in a pass? There's an idea of lazy loading in mlir::BytecodeReader. But I got lost and couldn't extract anything helpful from there.

j2kun commented 3 months ago

Sure! The basic idea is that an arith.muli lowers to the ops bgv.mul and bgv.relin. The relin is necessary to return the key basis from (1, s, s^2) to (1, s) after the bgv.mul, but sometimes the relin op can be delayed if the output of the mul is combined with other values in that same key basis (1, s, s^2).

For example, if you multiply two pairs of values and then add their results, you can do it with one relin op after the addition, rather than two relin ops, one after each mul.

The implementation of this feature would, I believe, require some relatively straightforward tablegen patterns. Look for opportunities to push a relin op further back, by matching on ops whose return values have a ciphertext type with the same key basis attribute. If you'd like, we could work through some examples in detail in this issue thread.

AlexanderViand-Intel commented 3 months ago

While looking at bgv.mul, I noticed it (incorrectly) wouldn't allow having sequential multiplications without relineraization in-between. With the fix in (currently PR #733), I'd suggest the simplest way to realize this is to proceed as follows

make the secret->bgv pass lowering of multiplication not emit any relinearization.
have a new "lazy relin" pass with two simple patterns that rewrite mul(mul1, _) and mul(_, mul1) to mul(relin(mul1),_ and mul(_, relin(mul1)), respectively. As Jeremy suggested, these should be pretty straightforward DDR patterns.

j2kun commented 3 months ago

I didn't realize that relinearizing after repeated multiplications is a thing people do. I assume that:

The key basis for that would be (1, s, s^2, s^3, s^4)
We'd have to implement more generalized lowerings of relinearize for all these new possible key bases (and, e.g., maybe some backends don't support it? Does OpenFHE allow this?)

AlexanderViand-Intel commented 3 months ago

I didn't realize that relinearizing after repeated multiplications is a thing people do.

It's not really something people do outside of very specific use cases. I think quite a few of the common libraries don't even implement "general" relinearization, because it'd blow up the evaluation key size so much. EDIT: I do think we should be able to model it in the IR, though, and not just because it makes lazy relin easier.

Coming back to the pass: My suggestion isn't necessarily to emit final programs with repeated multiplications (i.e., the openFHE pipline would always contain --lazy-relin ). It's just that, with MLIR, it seems a lot easier to take a "not yet relin-managed" program and introduce only the needed operations and harder to remove the redundant ones.

j2kun commented 3 months ago

I do worry a little bit that the key basis in the attributes would explode for some mul-heavy IRs, but I'm down to try it.

Another approach would be to take any IR, using relins or not, and re-relinearize-ify it. That would be the most general approach, but couldn't be done with a simple set of patterns. I bet we could formulate it as an ILP if it gets to that.

ahmedshakill commented 3 months ago

If you'd like, we could work through some examples in detail in this issue thread.

Yes, working through some example would be helpful for me.

I was thinking about this pattern

mul(mul1, _) and mul(_, mul1) to mul(relin(mul1),_ and mul(_, relin(mul1))

do we have a possibility to reach at something like this mul3(_, relin(mul2(_, relin(mul1))))

AlexanderViand-Intel commented 3 months ago

do we have a possibility to reach at something like this mul3(_, relin(mul2(_, relin(mul1))))

I think so: if we had mul3(_, mul2(_, mul1(_,_))) to begin with, the RHS pattern would match on mul2 to give us mul3(_, mul2(_,relin(mul1(_,_))) and also on mul3 to then give us mul3(_,relin(mul2(_, relin(mul1(_,_))). Which is (I think?) what we want, right? This also works if the pattern is first applied on mul3 and then mul2, and it should also work if both operands are multiplications.

off-topic question: is it better to add a third pattern mul(mul1, mul2) for this case, or to just rely on the sequential application of the LHS/RHS patterns?

ahmedshakill commented 3 months ago

mul3(_ ,relin( mul2( relin(lmul1(x,y)), relin(rmul1(y,x)))))

x and y are not operations.

Here I am considering both operands of mul2 are multiplications. Inside mul2 the left and right operands would have same key basis I guess. So at this stage could we optimize the above to mul3(_ ,relin(mul2( lmul1(x,y), rmul1(y,x))))
such that we relin only after mul2 ?

AlexanderViand-Intel commented 3 months ago

So at this stage could we optimize the above to mul3(_ ,relin(mul2( lmul1(x,y), rmul1(y,x)))) such that we relin only after mul2 ?

I don't think that'd be valid (assuming we never want to exceed ctxt-degree three) as the output of mul2 would be of degree six already.

j2kun commented 3 months ago

I think perhaps the simplest initial test case would be

%x = bgv.mul %0, %1
%x_relin = bgv.relinearize %x
%y = bgv.mul %3, %4
%y_relin = bgv.relinearize %y
%z = bgv.add %x_relin, %y_relin

Which should map to

%x = bgv.mul %0, %1
%y = bgv.mul %3, %4
%z = bgv.add %x, %y
%z_relin = bgv.relinearize %z

So long as %x_relin and %y_relin have no other uses. This example could be extended to more muls and a reducing add.

j2kun commented 3 months ago

Looking back on https://www.jeremykun.com/2023/11/15/mlir-a-global-optimization-and-dataflow-analysis/#an-ilp-optimization-pass, I'm not seeing any real obstacle to adapting that ILP to this problem. Basically, after each BGV op in the IR, you add a "InsertRelin" variable (instead of "InsertReduceNoise"), and a "KeyBasisDegree" variable instead of a "NoiseAt" variable. The constraints are simpler because KeyBasisDegree is additive in each mul and unchanged by other ops, and we can constrain the KeyBasisDegree of each starting SSA value and any decrypted/returned SSA values to be 1. There would be no upper bound on the maximum allowable key basis degree, though in practice we may want to make this configurable. The objective would be to minimize the sum of InsertRelin variable values.

@ahmedshakill what do you think, are you up for implementing an ILP? The existing implementation is here

ahmedshakill commented 3 months ago

Yeah, I am up for it. The task is interesting and the resources (blogs and discussions) are really good. I would hate to lose this opportunity. As a newbie, I would need a bit more time though than standard.

AlexanderViand-Intel commented 3 months ago

Looking back on https://www.jeremykun.com/2023/11/15/mlir-a-global-optimization-and-dataflow-analysis/#an-ilp-optimization-pass, I'm not seeing any real obstacle to adapting that ILP to this problem.

Mh, wouldn't you need to keep all the noise-related stuff in the problem, still? Otherwise, you could end-up with a trivial solution that lets the degree explode throughout the computation and only relinearizes at the very end, which would be both very slow and absolutely catastrophic for noise growth.

Maybe enforcing that no relin is "higher degree" than 3->2 is sufficient to make it work? I vaguely recall papers looking into things more optimal than the basic layz relin, but I'm not sure if they showed sufficient performance gains to make it worthwhile.

j2kun commented 3 months ago

Otherwise, you could end-up with a trivial solution that lets the degree explode throughout the computation and only relinearizes at the very end, which would be both very slow and absolutely catastrophic for noise growth.

That is true. You could easily force it to have max degree 3, or add a cost to each relin based on the input degree.

I didn't think relin affected noise growth though. Can't you modulus switch in a higher key basis?

j2kun commented 3 months ago

@ahmedshakill Any progress or help needed on this topic? Feel free to post a draft PR and I can take a look and offer some suggestions.

ahmedshakill commented 3 months ago

I was taking a bit of time. I'll update you on my progress and post a pr asap.

ahmedshakill commented 3 months ago

Hi @j2kun it took me a while to respond. I have created a PR. In the current state it holds a initial template for the pass. Please have a look and let me know about your thoughts.

AlexanderViand-Intel commented 3 months ago

Btw, I realized that my "simple DDR Pattern"-based approach probably wouldn't work, as it wouldn't correctly handle something like:

// Assume arg0, arg1, arg2, arg3 are "appropriately typed" function parameters
%0 = bgv.mul(%arg0, %arg1)
%1 = bgv.add(%0, %arg2)
%2 = bgv.mul(%1, %arg3)

I.e., if there are any kind of non-mul operations between the multiplications, the patterns wouldn't catch them. However, @Maokami's new Mult-Depth Analysis should be the answer to these cases: #760

Even if you're going for a more ILP/optimization based approach, I'd expect the mult-depth analysis to come in handy :)

ahmedshakill commented 2 months ago

what do you think, are you up for implementing an ILP?

I think, now I understand the weight of the above line. Here is the current progress status

[x] The lazy relin pass is visible in heir-opt.
[x] Looked into google or-tools tutorial on LP and MIP.
[x] Need an in-depth look into Jeremy's ILP blog.
[ ] Translate the minimization problem into an ILP one.
[ ] Introduce trait/interface

j2kun commented 2 months ago

I can certainly help answer any questions if you have them. Feel free to chime in on Discord as well (see the link at https://heir.dev/community/) for less formal discussions

On Thu, Jul 11, 2024, 1:35 PM Shakil Ahmed @.***> wrote:

what do you think, are you up for implementing an ILP?

I think, now I understand the weight of the above line. Here is the current progress status

The lazy relin pass is visible in heir-opt.

Looked into google or-tools tutorial on LP and MIP.

Need an in-depth look into Jeremy's ILP blog.

Translate the minimization problem into an ILP one.

Introduce trait/interface

— Reply to this email directly, view it on GitHub https://github.com/google/heir/issues/599#issuecomment-2223881034, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS2PKTE4NAOWAXB3WTMWQLZL3UA7AVCNFSM6AAAAABFZXY3A6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA4DCMBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

google / heir

[BGV]: add lazy relinearization #599