CKKS is much faster than BFV (or BGV) when performing matrix multiplication?

DylanWangWQF commented 3 years ago

When I test matrix multiplication based on CKKS, it's much faster than BFV (or BGV). Because CKKS generate approximate results, right? So if we want the exact results, CKKS is not suitable, as described in README.md. But can we optimize the performance of rotation? Since in my matrix multiplication test, a certain number of rotations are necessary.

WeiDaiWD commented 3 years ago

Because CKKS generate approximate results, right?

That has nothing to do with the reason why matrix multiplication in CKKS is faster than that in BFV. The reason is CKKS ciphertext multiplication algorithm is faster than BFV.

But can we optimize the performance of rotation?

Rotations in CKKS and BFV have the same performance.

fionser commented 3 years ago

If both matrix, vector are ciphertext, then CKKS is faster because the multiplication of CKKS is faster than BFV. This is algorithmic and we can do little with it.

But if one of operand is plaintext. Thing can be different.

Basically, SEAL-BFV keeps the plaintext (polynomial) in its coefficient form while the SEAL-CKKS uses the ntt form.

As a result, the plaintext-ciphertext multiplication for BFV would be slower than CKKS (because BFV's plaintext needed to be converted to its NTT form which is costly). On the other hand, CKKS just shift this work load to the encoding phase. If you put the "encoding" phase and evaluation together, I think CKKS/BFV would give the same performance.

This is an implementation decision.

DylanWangWQF commented 3 years ago

@WeiDaiWD @fionser Thank you for your reply! And sorry, I just test these operations by employing BGV in HElib and HEAAN until now. Have not tested BFV. But the comparison of performance should be similar to SEAL? From the test results, it seems that the performance would be quite different if we use the larger parameters.

These are the test code and results: HEAAN test:

    //Params: logN = 13; logp = 35; logq = 55.
    Context context(/*logN=*/13,/*logQ =*/55);
    SecretKey secretKey(/*logN=*/13, /*h=*/64);
    Scheme scheme(secretKey, context);
    scheme.addLeftRotKeys(secretKey);
    scheme.addRightRotKeys(secretKey);

    complex<double>* cmsg1 = new complex<double>[256];
    complex<double>* cmsg2 = new complex<double>[256];
    for (int i = 0; i < 256; i++) {
        double dtemp;
        conv(dtemp, (i % 2));
        cmsg1[i].real(dtemp);
        cmsg2[i].real(dtemp);
    }

    // Encryption
    Ciphertext ctxt1;
    Ciphertext ctxt2;
    auto start= chrono::steady_clock::now();

    ctxt1 = scheme.encrypt(cmsg1, /*slots=*/256, /*logp=*/35, /*logq=*/55);
    ctxt2 = scheme.encrypt(cmsg1, /*slots=*/256, /*logp=*/35, /*logq=*/55);

    auto end = std::chrono::steady_clock::now();
    auto diff = end - start;
    double timeElapsed = chrono::duration <double, milli> (diff).count()/1000.0;
    cout << "Encryption time= " << timeElapsed/2 << " s" << endl;

// Rotation
    Ciphertext temp;
    start = chrono::steady_clock::now();

    temp = scheme.rightRotate(ctxt1, 7 );

    end = std::chrono::steady_clock::now();
    diff = end - start;
    timeElapsed = chrono::duration <double, milli> (diff).count()/1000.0;
    cout << "Rotation time= " << timeElapsed << " s" << endl;

// Multiplication
    start = chrono::steady_clock::now();

    scheme.multAndEqual(ctxt1, ctxt2);
    scheme.reScaleByAndEqual(ctxt1, logp);

    end = std::chrono::steady_clock::now();
    diff = end - start;
    timeElapsed = chrono::duration <double, milli> (diff).count()/1000.0;
    cout << "Multiplication time= " << timeElapsed << " s" << endl;

BGV test:

// m: Cyclotomic polynomial - defines phi(m);
// p: Plaintext prime modulus
// r: Hensel lifting (default = 1)
// bits: Number of bits of the ciphertext modulus chain (Level)
// c: Number of columns of Key-Switching matrix (default = 2 or 3)

Params param(/*m=*/20480, /*p=*/127, /*r=*/1, /*bits=*/200, /*c=*/2); //slots = 256;

/*ea.size()=slots=256*/
vector<long> cmsg1(ea.size());
    vector<long> cmsg2(ea.size());
    for (int i = 0; i < ea.size(); i++) {
        cmsg1[i] = i % 2;
        cmsg2[i] = i % 2;
    }

// Encryption
    auto start= chrono::steady_clock::now();

    Ctxt ctxt1(publicKey);
    ea.encrypt(ctxt1, publicKey, cmsg1);
    Ctxt ctxt2(publicKey);
    ea.encrypt(ctxt2, publicKey, cmsg2);

    auto end = std::chrono::steady_clock::now();
    auto diff = end - start;
    double timeElapsed = chrono::duration <double, milli> (diff).count()/1000.0;
    cout << "Encryption time= " << timeElapsed/2 << " s" << endl;

//Rotation
    start= chrono::steady_clock::now();

    rotate(ctxt1, 7);

    end = std::chrono::steady_clock::now();
    diff = end - start;
    timeElapsed = chrono::duration <double, milli> (diff).count()/1000.0;
    cout << "Rotation time= " << timeElapsed << " s" << endl;

//Multiplication
    start= chrono::steady_clock::now();

    ctxt1.multiplyBy(ctxt2);

    end = std::chrono::steady_clock::now();
    diff = end - start;
    timeElapsed = chrono::duration <double, milli> (diff).count()/1000.0;
    cout << "Rotation time= " << timeElapsed << " s" << endl;

HEAAN test results (run many times):

Encryption time= 0.0162935 s (supposed to contain the encode time )
------------------
Rotation time= 0.0448905 s
------------------
Multiplication time= 0.0220362 s

BGV test results:

Encryption time= 0.0269653 s
------------------
Rotation time= 0.374534 s
------------------
Multiplication time= 0.164159 s
------------------

fionser commented 3 years ago

I have no idea why you did not at least align the parameter in HElib (m > 20k) but HEAAN (N = 2^13). Please clarity your question (concerns) a more little bit

DylanWangWQF commented 3 years ago

@fionser Sorry, I'm not familiar with the parameters and detailed implementations in HEAAN, that's a bad test.

Since I want to get the exact number of slots (such as 256, 1024 and 4096), I choose these parameters for HElib. For /*m=*/20480, /*p=*/127, /*r=*/1, /*bits=*/200, /*c=*/2, I use the program to find the corresponding parameters m=20480 and p=127, then I get slots=256.

Then in HEAAN, we can directly set the exact number of slots.

BeStrongok commented 7 months ago

If both matrix, vector are ciphertext, then CKKS is faster because the multiplication of CKKS is faster than BFV. This is algorithmic and we can do little with it.

But if one of operand is plaintext. Thing can be different.

Basically, SEAL-BFV keeps the plaintext (polynomial) in its coefficient form while the SEAL-BFV uses the ntt form.

As a result, the plaintext-ciphertext multiplication for BFV would be slower than CKKS (because BFV's plaintext needed to be converted to its NTT form which is costly). On the other hand, CKKS just shift this work load to the encoding phase. If you put the "encoding" phase and evaluation together, I think CKKS/BFV would give the same performance.

This is an implementation decision.

Hey, @fionser Is there a typo here? I think "SEAL-BFV keeps the plaintext (polynomial) in its coefficient form while the SEAL-CKKS uses the ntt form" is exactly what you mean.

microsoft / SEAL

CKKS is much faster than BFV (or BGV) when performing matrix multiplication? #331