Closed ArseniyKholod closed 2 months ago
Unfortunately, I don't see a possibility to configure OpenFHE to work with the allocator form GLib, so the only way to remove the created C++ objects is to run GC.gc()
explicitly for a full scan of the memory heap and, consequently, remove pointers to these object from the old region.
Next idea is to forget about GC and just call reset
for all shared_ptr
s within wrapped C++ function.
Thanks for the detailed analysis, and thank you to @vchuravy for his input!
@ArseniyKholod Did you try this yet:
Next idea is to forget about GC and just call
reset
for allshared_ptr
s within wrapped C++ function.
And if yes, to what avail?
And if yes, to what avail?
Simply calling for example ciphertext.reset()
does not help. OpenFHE library utilizes some static objects with saved copies of shared_ptr
s, that should be all reset()
, I haven't found all of them yet, it is a pretty complex hierarchy of classes, so I'm still in research, but it looks like a lot of manual coding to deallocate every shared_ptr
, but if it will work, probably it is better than calling GC in aggressive mode manually, as it is not efficient then...
For posterity's sake, I was referring to jl_malloc
/jl_free
and not g_malloc
.
One could potentially use something like https://github.com/JuliaLang/julia/pull/55390 to directly manipulate the "apparent heap size". Or run Julia with --heap-size-hint=4G
to encourage Julia to run full GC earlier.
Normal mode of GC works in a way, that it looks at recent objects and not at old ones.
Julia's GC is generational. There is an "old" and a "young" generation. During an "incremental" collection, Julia only visits "young" objects, every so often Julia runs a "full" collection. Those are more expensive, since they are a full heap-traversal.
For posterity's sake
@vchuravy, thank you for correcting my misunderstanding and for the new idea with --heap-size-hint
, perhaps this is the easiest way!
I found a possible solution. So what is the problem? GC is generative and does not remove pointers to the large C++ files we use. For example 8B pointer to Ciphertext, that for secure simulation weight around 60MB. So not deleting 8B leads to memory leak in 60 MB.
The idea is to manage all the pointers by ourselves. All objects like Ciphertext
, CryptoContext
, and so on are managed like shared_ptr
. I would like to manually reset
all of them.
TODO:
reset
for each type in openfhe-julia, e.g. for ciphertext:
mod.method("reset",
[](std::shared_ptr<lbcrypto::CiphertextImpl<lbcrypto::DCRTPoly>>& ptr)
{ptr.reset();});
We must manually delete all shared_ptr
, also so-called rvalues, like a+b
, where a
and b
are ciphertexts. Add to each structure in SecureArithmetic.jl new field rvalue
to detect temporary objects, e.g. SecureVector
:
mutable struct SecureVector{CryptoBackendT <: AbstractCryptoBackend, DataT}
data::DataT
length::Int
capacity::Int
context::SecureContext{CryptoBackendT}
rvalue::Bool
function SecureVector(data, length, capacity, context::SecureContext{CryptoBackendT}, rvalue=false) where CryptoBackendT
new{CryptoBackendT, typeof(data)}(data, length, capacity, context, rvalue)
end
end
All operators should check, whether they perform actions with lvalues or rvalues, e.g. a+b+c
does firstly a+b
, this is an rvalue, that is used then in (a+b)+c
, so while performing the sum with c
, rvalue a+b
must be reset. Redefine all arithmetical operations, e.g. multiply
function multiply(sv::SecureVector{<:OpenFHEBackend}, scalar::Real)
cc = get_crypto_context(sv)
ciphertext = OpenFHE.EvalMult(cc[], sv.data[], scalar)
secure_vector = SecureVector(ciphertext, length(sv), capacity(sv), sv.context, true)
if(sv.rvalue)
OpenFHE.reset(sv.data)
end
secure_vector
end
a=b+c
, define new operator ===
, the name can be changed of course:
function ===(sv1::SecureVector, sv2::SecureVector)
OpenFHE.reset(sv1.data)
sv1 = sv2
sv1.rvalue = false
end
So if a
already a ciphertext, use a===b+c
instead of a=b+c
shared_ptr
as an argument, create a new instance of this shared_ptr
somehow intrinsically in Cxxwrap.jl, that we cannot reset, so rewrap all functions without shared_ptr
as an argument.
mod.method("EvalMult",
[](lbcrypto::CryptoContextImpl<lbcrypto::DCRTPoly>& cc, lbcrypto::CiphertextImpl<lbcrypto::DCRTPoly>& ciph,double c)
{return cc.EvalMult(std::make_shared<lbcrypto::CiphertextImpl<lbcrypto::DCRTPoly>>(ciph), c);});
instead of
wrapped.method("EvalMult",
static_cast<lbcrypto::Ciphertext<lbcrypto::DCRTPoly>
(WrappedT::*)(lbcrypto::ConstCiphertext<lbcrypto::DCRTPoly>,
double) const>(&WrappedT::EvalMult));
This requires many changes, but the problem is also serious. For example of ciphertext operation a=(a+b)*c+d
we lose a minimum 3 pointers to the CiphertextImpl
objects with a total weight of around 180 MB for secure simulations. What if we do it in a loop?...
There are two problems with this solution.
1) It requires a large amount of work, but it seems to be necessary, calling GC.gc()
is not what the user probably wants, not to mention the performance of a full GC scan... Nevertheless, I'm able to manage all the changes in some weeks.
2) Usage of ===
instead of =
is not intuitive, but Julia does not give an access to redefine =
...
Thanks a lot for the detailed analysis. If I understand correctly, it boils down to the following:
std::shared_ptr
objectsDid I get it right thus far?
Then, my next question is whether manually calling GC.gc()
would fix this memory issue as well?
For example, if you put GC.gc()
at the end of each loop (or also at intermediate positions) effectively free up the memory again? Or is there another issue that I don't see?
Also, what is the computational cost of a call to GC.gc()
compared to, say, an FHE multiplication? Maybe it is not too expensive?
Yes, you are right.
Calling GC.gc()
resolves this issue. I call it in every loop iteration for OpenFHEBackend
:
if u.context.backend isa OpenFHEBackend
GC.gc()
end
The call of full GC is comparable with a single ciphertext-scalar multiplication, but ~10 times faster than ciphertext-ciphertext multiplication for 128-bit security.
On Rocinante the single call of GC takes ~0.08 sec, so for 100 steps of simulations, it takes an additional ~8 sec, when the simulation itself >~1000 sec. So yeah, GC.gc()
performance here is not a big issue for 128-bit security, just IMHO not typical for Julia manual memory releases.
In cases of running simulation with some smaller ring dimensions, like 2^12, the run time of GC.gc()
will be significant ~10-20%, but it is a non-production application without sufficient security level.
Note: The GC.gc()
call resolves only the described issue with memory leak, but to remove heavy-weight (~several GB) pre-computed evaluation keys during a single Julia REPL run (needed if several applications with different CryptoContext
's are running in a row without exiting the REPL in-between), I created an additional PR https://github.com/hpsc-lab/openfhe-julia/pull/61
Hm. Then I think it would be better for now to add a (prominent) note the the README and/or docs of OpenFHE.jl and SecureArithmetic.jl to inform users about this issue, and that they need to take care of it manually for now.
Alternatively: @vchuravy Is it possible to manually tell the GC that an object is "heavy", such that it is evicted as early as possible from GC?
Hm. Then I think it would be better for now to add a (prominent) note the the README and/or docs of OpenFHE.jl and SecureArithmetic.jl to inform users about this issue, and that they need to take care of it manually for now.
I added in README docs an explanation of how to deal with mentioned memory issues: https://github.com/hpsc-lab/OpenFHE.jl/pull/64 and https://github.com/hpsc-lab/SecureArithmetic.jl/pull/44
Thanks. This can be closed then?
I realized, that our heavy-weight function, like
circshift
, accumulates memory. In these functions, when a C++ object is allocated, it is saved as a pointer in Julia. These objects are not freed during the application run without explicitGC.gc()
calls. It leads to memory overflow.Today at the HPSC meeting I spoke with Valentin. He explained the reason, thank him very much!
I will try to reproduce it here. I could misunderstood something, so it is only the starting point.
1) When Julia asks to allocate С++ object, a pointer is created and then the C++ object is allocated.
2) Normal mode of GC works in a way, that it looks at recent objects and not at old ones.
3) Our objects for secure FHE are massive, so in the memory heap lightweight pointer will be considered as an old object ("hidden by massive C++ object") and will not be deallocated by GC in normal mode. GC cannot also deallocate a C++ object, because it is not Julia's. This massive C++ object stays in memory until the pointer is deallocated, and only then CxxWrap.jl can finalize it. Also, GC does not see that memory is almost full by C++ objects, it thinks we have still plenty of it, so it does not enter aggressive mode automatically. So GC does not deallocate the pointer -> memory overflow happens.
Solutions: 1) Solution that I use: call explicitly
GC.gc()
several times to enter aggressive mode, so GC will look not only at recent objects, but also at old ones. Calling GC by yourself is not a good Julia's practice, so it should be avoided.2) Valentin said, that if it is possible to configure OpenFHE C++ library with a custom allocator, we can choose
g_malloc
, with this one Julia's GC will be able to see, that C++ objects full the memory and will be able to enter aggressive mode by itself.So I will start to explore
g_malloc
and possibility to choose it for OpenFHE C++.