Question: Thready safety of encryption and encoding

isentropic commented 12 months ago

Here is a minimal example of a panic: runtime error: slice bounds out of range [:192] with length 128

package main

import (
    "github.com/tuneinsight/lattigo/v5/core/rlwe"
    "github.com/tuneinsight/lattigo/v5/he/hefloat"
    "math/rand"
    "runtime"
    "sync"
)

func encrypt_vector(values []float64,
    ecd *hefloat.Encoder, enc *rlwe.Encryptor, params hefloat.Parameters) *rlwe.Ciphertext {
    var err error
    pt := hefloat.NewPlaintext(params, params.MaxLevel())
    if err = ecd.Encode(values, pt); err != nil {
        panic(err)
    }
    var ct *rlwe.Ciphertext

    if ct, err = enc.EncryptNew(pt); err != nil {
        panic(err)  // panic shoots here
    }

    return ct
}

func main() {
    var err error
    var params hefloat.Parameters

    if params, err = hefloat.NewParametersFromLiteral(
        hefloat.ParametersLiteral{
            LogN:            13,            // log2(ring degree)
            LogQ:            []int{60, 60}, // log2(primes Q) (ciphertext modulus)
            LogP:            []int{61},     // log2(primes P) (auxiliary modulus)
            LogDefaultScale: 50,            // log2(scale)
        }); err != nil {
        panic(err)
    }

    kgen := rlwe.NewKeyGenerator(params)
    sk := kgen.GenSecretKeyNew()
    ecd := hefloat.NewEncoder(params)
    enc := rlwe.NewEncryptor(params, sk)

    runtime.GOMAXPROCS(100)
    var wg sync.WaitGroup

    r := rand.New(rand.NewSource(0))
    values := make([]float64, params.MaxSlots())
    for i := range values {
        values[i] = 2*r.Float64() - 1 // uniform in [-1, 1]
    }

    rows := 500 // If these values are small like 5 there is no problem
    cols := 400

    ciphers := make([][]*rlwe.Ciphertext, rows)
    for i := range ciphers {
        ciphers[i] = make([]*rlwe.Ciphertext, cols)
    }
    for i := 0; i < rows; i++ {
        for j := 0; j < cols; j++ {
            wg.Add(1)

            go func(i, j int) {
                defer wg.Done()
                ct := encrypt_vector(values, ecd, enc, params)
                ciphers[i][j] = ct
            }(i, j)
        }
    }

    wg.Wait()
}

I was attempting to perform parallel encryption and encoding using goroutines. Apparently encoding and/or encryption is not really thread safe. Is there any information about thread safety in general for other operations? For low count of threads and rows,cols there is no race condition popping up, but with larger arrays like in this example this is a problem. Here is the stacktrace:

goroutine 15061 [running]:
golang.org/x/crypto/blake2b.(*digest).finalize(0x140003da500?, 0x140003da6b8?)
        /Users/username/go/pkg/mod/golang.org/x/crypto@v0.16.0/blake2b/blake2b.go:254 +0x190
golang.org/x/crypto/blake2b.(*xof).Read(0x140003da500, {0x14000156000, 0x14047282a78?, 0x2000})
        /Users/username/go/pkg/mod/golang.org/x/crypto@v0.16.0/blake2b/blake2x.go:148 +0x270
github.com/tuneinsight/lattigo/v5/utils/sampling.(*KeyedPRNG).Read(0x1400011c810?, {0x14000156000?, 0x14074c73590?, 0x2?})
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/utils/sampling/prng.go:59 +0x2c
github.com/tuneinsight/lattigo/v5/ring.(*UniformSampler).read(0x14005aa24c0, {{0x14028aaaa20?, 0x140001241b0?, 0x1?}}, 0x10435c400)
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/ring/sampler_uniform.go:85 +0x23c
github.com/tuneinsight/lattigo/v5/ring.(*UniformSampler).Read(...)
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/ring/sampler_uniform.go:37
github.com/tuneinsight/lattigo/v5/ring/ringqp.UniformSampler.Read({0x14005aa24c0?, 0x0?}, {{{0x14028aaaa20, 0x2, 0x2}}, {{0x0, 0x0, 0x0}}})
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/ring/ringqp/samplers.go:48 +0x58
github.com/tuneinsight/lattigo/v5/core/rlwe.Encryptor.encryptZeroSk({{0xd, {0x14000128ef0, 0x2, 0x2}, {0x14000128ee8, 0x1, 0x1}, {{0x10435e930, 0x14000128940}, 0x400999999999999a, ...}, ...}, ...}, ...)
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/core/rlwe/encryptor.go:352 +0x104
github.com/tuneinsight/lattigo/v5/core/rlwe.Encryptor.EncryptZero({{0xd, {0x14000128ef0, 0x2, 0x2}, {0x14000128ee8, 0x1, 0x1}, {{0x10435e930, 0x14000128940}, 0x400999999999999a, ...}, ...}, ...}, ...)
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/core/rlwe/encryptor.go:179 +0x78
github.com/tuneinsight/lattigo/v5/core/rlwe.Encryptor.Encrypt({{0xd, {0x14000128ef0, 0x2, 0x2}, {0x14000128ee8, 0x1, 0x1}, {{0x10435e930, 0x14000128940}, 0x400999999999999a, ...}, ...}, ...}, ...)
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/core/rlwe/encryptor.go:143 +0x158
github.com/tuneinsight/lattigo/v5/core/rlwe.Encryptor.EncryptNew({{0xd, {0x14000128ef0, 0x2, 0x2}, {0x14000128ee8, 0x1, 0x1}, {{0x10435e930, 0x14000128940}, 0x400999999999999a, ...}, ...}, ...}, ...)
        /Users/username/go/pkg/mod/github.com/tuneinsight/lattigo/v5@v5.0.2/core/rlwe/encryptor.go:165 +0x130
main.encrypt_vector({0x140000f8000, 0x1000, 0x1000}, 0x14000070160, _, {{{0xd, {0x14000128ef0, 0x2, 0x2}, {0x14000128ee8, ...}, ...}, ...}})
        /Users/username/Projects/panama-convolution/go-lattigo/testing/test-parallel-cipher/parallel-cipher.go:20 +0x184
main.main.func1(0x25, 0x3e)
        /Users/username/Projects/panama-convolution/go-lattigo/testing/test-parallel-cipher/parallel-cipher.go:67 +0xa4
created by main.main in goroutine 1
        /Users/username/Projects/panama-convolution/go-lattigo/testing/test-parallel-cipher/parallel-cipher.go:65 +0x4e4
exit status 2

Pro7ech commented 12 months ago

Hi @isentropic, could you provide more context around the application that you are trying to build?

Lattigo is not thread safe, to be able to use an encryptor concurrently, you must create a shallow copy for each separate go routine. This can be done with encryptor.ShallowCopy()

isentropic commented 12 months ago

Hello, I'm trying to build a CNN based on CKKS for training (not inference). I have started this journey with pyseal only to realize to make this whole thing faster I gotta use threads as the whole operation is parallel. So I could not use python, then I tried to use Julia and Seal.jl bindings for it, only to realize that SEAL inadvertently leaks memory and sooner or later I get memory issues. Which is why I've come to test my luck with lattigo and see if its faster. I also noticed that while lattigo is a little faster for a single cipher-cipher multiplication in isolation, seal outperforms lattigo when I build a CNN. Maybe its because of seal's weird memory allocation/reuse procedure? That part is also annoying in that memory leaks are unavoidable if used from cbindings of seal.

I also wonder about thread safety of other operations like addition and multiplication are they all thread safe?

Pro7ech commented 12 months ago

It's the same behavior for all structs: Encoder, Encryptor, Decryptor, Evaluator, ... If you want to use them concurrently, you need to create shallow copies.

Lattigo will outperform SEAL if memory is correctly managed (minimizing the number of allocations) and parameterization for the auxiliary prime (LogP) is correctly set. The slowdown you experience might be caused by the garbage collector (which by the way gets ride of memory leaks), which will be triggered a lot if you constantly allocate memory for ciphertexts/plaintexts, e.g. by calling MulRelinNew instead of MulRelin.

May I know where you are from? Is this for a PhD project?

isentropic commented 12 months ago

I work for a startup in research department and we are trying to test this idea of CNN with HE. Currently just trying to do one layer only encoded in the batch direction. Hence there is no need for smart packing and rotations as batch dim is naturally independent.

Now that I've experimented more with Seal and lattigo and multithreadding i've come to realize something I should have realized a long before. FHE is fundamentally memory bound it creates and moves a lot of memory and for CNN creation of new memory is unavoidable (GPUs struggle with this for normal clearstate nets). It does not matter that I have N cores, as long as the memory throughput and RAM is constant I would never see a sizable speedup. What do you think about this?

Seal has been faster because of its weird memory allocation, but that's a double edged sword as it creates memory leaks. Though, I've tried to be smart about memory usage by trying to pre-allocate ciphertexts first before doing any computation but its just that memory bottleneck is bound to happen sooner or later.

I gotta say I've enjoyed go and lattigo especially the fact that it is much more readable than seal code, and probably we would go with lattigo further down the line. I just wish lattigo had python bindings of sorts for an easier and interactive usage for research purposes. Many thanks for your responses and this library, it is brilliant.

Pro7ech commented 12 months ago

Your remark on memory bandwidth is very close to what was said in this paper: Does Fully Homomorphic Encryption Need Compute Acceleration?

Pro7ech commented 12 months ago

I'm closing this issue as the mentioned outcome is an intended behavior of the library. If you want to know more about the library or have any other question, you can contact us at lattigo@tuneinsight.com.

tuneinsight / lattigo

Question: Thready safety of encryption and encoding #422