Closed lzjluzijie closed 7 months ago
I made the test pass with interleaved mode and "field" u128
. I didn't find convenient prime field implementation so I kept using u128
, I think it should be similar.
As for the current code, I split the getLevel
to intermediate levels and the last level, and some other changes as well. The code is still WIP and very messy and not ready for review yet. I will do some clean up after everything is done.
@ladnir Could you take a look at the current PPRF implementation and the subfield test if you have time(mainly if my tests make sense)?
For the coding part, I felt it would take some time to change LinearCode or QuasiCyclicCode to work with any template field. ~I think I can try QuasiCyclicCode with block
replaced by u128
, since there are the same size, to verify the idea first?~
do not use QuasiCyclicCode. We can just use ExConv
. When you are ready I can give some pointers on how.
As mentioned in the comments, It best to thing of F,G
as some arbitrary type that you can assume nothing about. Whenever you need some functionality out of them, you should achieve this by a function call.
I would suggest changing the type level template to be some FieldTraits
which has subtypes FieldTraits::F,FieldTraits::G
. Whenever you want some functionality, you can call the function FieldTraits::functionality(...)
.
Thank you for your review, for ExConv
, I should read https://eprint.iacr.org/2023/882 ?
yeah, likely would be helpful. You don't need to understand the proof but the basic idea. So the intro and the start of section 5
I wonder if we should define some virtual class Field
that supports operator+
, operator*
, and fromBlock
? I am not expert in cpp. I would implement this "interface" style if in other languages, but is this a good choice in cpp(might be slow than template)?
virtual would be the natural way of doing it but it has very bad performance.
The cost of a function call is orders of magnitude higher than say +
. C++ often allows you to avoid this cost by having the compiler inline the function, essentially copy pastes the function body into the call sight so there is no function call. However, this can not work for virtual functions because the compiler does not know which "virtual" instances is currently being used.
You can still achieve this same type of "virtual" behavior with templates. In fact, templates are basically virtual functions where you are additionally telling the compiler exactly which type is being used.
So concretely, my suggestion is the following:
The caller will be responsible for defining a type traits class
struct TypeTrait
{
using F = u128;
using G = u8;
F sampleF(PRNG& prng);
F fromBlock(const block& blk);
F getPowerOfTwo(u64 power);
...
};
each of the subfield classes will take a TypeTrait
as a template parameter. You can then define everything else in terms of that.
TypeTrait
will essentially form the "virtual" class that other languages use. However, the caller can define their TypeTrait
to have these functions be inline
and therefore the overhead of the function calls will disappear.
The caller is responsible for correctly implementing this "interface". There is no "base class" to inherit from. The caller is just expected to know what function to implement based on the documentation. If a function is missing, then the code doesn't compile.
make sense?
An example of this pattern can be found here. https://github.com/Visa-Research/volepsi/blob/main/volePSI/PxUtil.h#L550 In this case I called it Helper
instead of TypeTraits
and it has some other abstractions. But the same basic idea. It was used to allow the caller to call the OKVS for VOLE-PSI with basically any type as long as they provided the Helper
. In particular, they could even call it when the "element", in out case F
was some variable vector.
Update: I fixed some bugs in PPRF, and the u128 silent VOLE is working if comment out the final coding part. The code part is still not working, and I will continue working on it tomorrow. The code seems a little complicated, ~but I think it might work by changing some ^
to +
~. No need to review code at this time, but you could take a look at ExConvCode_encode_u128_test
if my test make sense if you have time. Thank you.
I realized that in the ExConvCode_encode_basic_test
test there exists code.dualEncode2<block, u8>(cc1, cc2);
. Could you explain why there are dualEncode2
(for performance)? I looks like that code is already well templated, but my current test does not work. Does it make sense to change ^
to +
?
Btw the ExConvCode_encode_basic_test
got segmentation fault on my end when build mode is DEBUG, but was OK in RELEASE mode. I will look into it later.
That's right, basically just need to use plus. Encode2 is for performance. It just encodes twice. You can just call the single in code twice if you want.
I got silent VOLE test passed with F=G=u128. The ExConvCode is indeed well templated and I only changed a few ^
to +
. I will test it with other fields later.
As for make this more generic, I wonder why the current block does not support operator*, and operator+ seems adding left and right parts independently as u64. Shall we change it to behave like GF(2^128) elements, or we define some class F128: block
and over write the operators?
block
is defined that way due to its primary purpose being an abstract on the __m128i
built in type. This type models special cpu instructions (aka intriniscs). It does not have 128 bit addition or multiple, so block
doesn't. There is 2x 64bit addition as you mention. Maybe I shouldn't have overloaded operator + since it's a bit confusing. But I did so oh well ha.
As suggested above, we shouldn't even use operator+
or operator^
. Instead there should be a type traits object that handles all the operations for you. You simply passing in the arguments. But this can happen at then end once the basic version is working
So we should do something like
struct TypeTrait
{
using F = u128;
using G = u8;
...
static inline F Add(const F& a, const F& b)
...
};
and I was suggesting
struct F128: block {
inline F128 operator+(const F128& rhs) const {
...
}
};
struct TypeTrait
{
using F = F128;
using G = F128;
...
};
(not using virtual). Is there performance difference?
It seems there is not much difference, but I agree it's safe to use a TypeTrait with explict Add functions.
Yes, it will be about 10x faster. Maybe in some cases the compiler might be smart enough to remove this overhead but best not to relay on the compiler that way.
In simples example the compiler does this thing called devirtualization which basically can inline virtually functions. But there's no guarantee that it will happen. However, if you use inline functions, you are basically guaranteed there's no cost. The function will disappear.
I see, I will change the code. And for ExConv, I should also modify it to use inline member function, not operator +
or ^
?
One more question please: In my second example, F128 inherits block, but the operator function is also inline without virtual. Is this kind of inline different from the member function inline in the first example? If so, is it because of F128 inherits block?
Ok, i took a closer look at you example. Inheritance != virtual. You can inherit all you want. The functions can still be inline and they would behave the same as if you wrote new version of them. What has a performance overhead is when you use the virtual
keyword on a function and then override it in some base class. Specifically, consider the example
struct base {
virtual void foo() = 0;
};
struct derived0 : base {
void foo() override { std::cout << "derived0 << std::endl; }
};
struct derived1 : base {
void foo() override { std::cout << "derived1 << std::endl; }
};
When someone has a base* v;
and they call v->foo();
what happens is that at the begining of derived0, derived1
, there is a hidden pointer that points to something that called a v-table. This v-Table looks like:
struct vTableForDerived0{
void(* baseFooFunctionPointer)(void* derived0);
};
struct vTableForDerived1{
void(* baseFooFunctionPointer)(void* derived1);
};
and the derived classes actually look like
struct derived0 : base {
void* pointerToVTable;
void fooImpl() { std::cout << "derived0 << std::endl; }
};
struct derived1 : base {
void* pointerToVTable;
void fooImpl() { std::cout << "derived1 << std::endl; }
};
Now the v->foo();
call gets transformed into
auto* vTable = v->pointerToVTable;
vTable->baseFooFunctionPointer(v); // this calls derived0::fooImpl() or derived1::fooImple()
so now it might be a bit clearer why doing virtual is a bit of a performance hit. You have to chase the pointer to the vtable and then you have to call baseFooFunctionPointer
. So say instead of printing something we were just doing a +
on a u64
. Then you are doing all this, plus the function call which itself has to do like a bunch of stuff (flush variables our of registers, update the program counter, make space for a new stack frame, execute the call instruction, and then you do a simple +
.
In some cases the compiler is smart and can look at your program and say "hay, even though v
is a Base*
, i know it's actually derived0
and therefore i can just call derived::fooImpl
. I will even inline it". However, if there are many possible derived types and the compiler can't see which is the type being used, then it can't do this devirtualization.
In your example above you don't use virtual and therefore none of this stuff happens. They are just normal function calls. As you can see in your assembly, these get inlined (no function calls made).
Correct, you can define add
like you did above and use that wherever you want to do a plus. Additionally, I would suggest not requiring that the TypeTrait
object only have a static function. So these subfield protocols will take the normal F,G
values an input and an instance of TypeTraits
. You use this instance to do all the additions and everything else.
Consider the case where you want to allow the modulus to be many different values. If add
is static, then this modulus must be somewhere within the F
type. e.g.
struct F {
u64 val;
u64 modulus;
};
So now all F
s have to be twice as big. However, if TypeTraits
is an actual object, you can simply stick the modulus in it. Now F
is only a u64
in the example above.
The alternative would be for F
to have modulus
hardcoded into the "type". But this is a bit less flexible. Although maybe its good enough haha.
Thank you very much for your explanation. I understand that virtual is slow. Now we should have two approaches
The first is use operators with field elements(no virtual), for example
struct F128 {
block b;
inline F128 operator*(const F128& lhs, const F128& rhs) const {
block r = lhs.b.gf128Mul(rhs.b);
return F128(r);
}
};
struct TypeTrait
{
using F = F128;
using G = F128;
...
// no explict Add needed.
};
The second approach:
struct TypeTrait
{
using F = block;
using G = block;
...
inline F Add(const F& a, const F& b)
...
};
// And protocols use member functions like Add, not operators
If my understanding is correct: both approaches should be similar performance, given there is no virtual and inlines are used. I think one advantage of the first approach that it needs less code. If we want to use primitive type for these protocol, we don't need to write Add
functions in the second approach. However, the second approach might be safer. In the first case, users might have some unsafe pointer operation that make F*
to block*
and use wrong operator function. But I think inside libOTe we can make sure this never happen and it should be fine?
As for the modulus, I suggest that we can make F templated with modulus, and I think this is more convenient(in the first approach).
So I personally prefers the first approach. Again I don't know much about cpp, is there any issue with it? Thank you.
Sure, you can do the first if you want. I might change it later though. Regarding the user, you can give a default version of TypeTrait to assumes F, G are normal types with +, etc.
Makes sense. So you think the second is better? I can do that directly.
I'd prefer the second because you are separating the data (F, G) from the operations (add, sample, etc). You can use the same datatype (eg block) for both a 128 bit prime field and GF128. But it's fine either way.
Makes sense. I will do more tests with larger fields and then do it.
I got some issue with genSilentBaseOts
. If I configure it with SilentBaseType::Base
and run BaseOT, it works normally. However, if I use SilentBaseType::BaseExtend
with NoisyVOLE requiring 64 not 128 OTs, assertion in libOTe/TwoChooseOne/SoftSpokenOT/SoftSpokenMalOtExt.cpp:56
would fail.
I'm not familiar with SoftSpokenOT and I wonder if there is any special reason to use it, not something simpler like IKNP?
I think we can do silent VOLE over general commutative rings, and current PPRF and ExConv code should support it. As for Noisy VOLE, the current method seems not working.
I have an idea: for G=u32 and F=u32^k, we can do k times of u32-u32 VOLE with the kth element of the type F delta from sender and same y array from receiver. The complexity should be similar, since we are still iterating every bit of delta once.
For Semi-honest iknp would be fine. For simplicity I use soft-spoken. It sends a bit less data as well...
Where is it failing exactly? I suspect is soft spoken is failing then iknp would too. This have the same api. Hard to say though.
For silent vole that's sounds right. For noisy vole, you're saying G=gf(32) and F=gf(k*32) doesn't work? The issue isn't clear to me.
SoftSpokenMalOtExt.cpp:56 would fail. I am saying G is uint32, F is array of uint32, not fields.
So the receiver has a in G^n
, the sender has X in F
.
The receiver computes vi = a * F(2^i)
. The user can define *
however they want, more or less. The receiver acts as OT sender
mi0 = Ri
mi1 = Ri + vi
where `Ri in F^n is random. The sender learns
mi_{Xi} = Ri + Xi * F(2^i) * a
When you sum the received messages you get
B = sum_i mi_{Xi}
= sum_i Ri + Xi * F(2^i) * a
= sum_i Ri + sum_i Xi * F(2^i) * a
= C + X * a
where C = sum_i Ri
.
You want to define G
as the Z_{2^32}
group and F
as the vector space over it. This is fine. It should work.
For softspoken,im not sure. Seem not right. What command/test produces the error. I can run it.
Sorry was very busy, now back to this.
SET(CMAKE_BUILD_TYPE Debug)
, and works fine in Release mode. I guess it might be some (stack)memory not properly initialized(I got similar behavior before) , but I didn't figure out the issue. You can test it with -u 123
on commit 91de36a.I will do more testing with larger "fields" and clean up code later.
Yeah, change the code however makes sense. The instantiation thing was to improve compile times but I don't think it makes sense anymore. I can look into soft spoken
There is already lots of changes in this PR. I think I should put that Noisy VOLE optimization in a separate PR.
Also I noticed that, many functions in AES.h like finalEnc
, roundEnc
are not force inlined, and are indeed not inlined when I test it(at least in DEBUG mode). I can write another PR for this, and fix the header issues in cryptoTools.
Do not use debug mode to make decisions about what being inlined. You'd need to use release. In debug builds the compiler often prefers not to inline because it can make debugging harder.
Sure, multiple PRs is good if that's what you prefer. Either way is fine with me.
Makes sense, but is there any reason to use the normal inline
, not OC_FORCEINLINE
in AES.h?
I added a F128
type which is essentially a block, but with the proper +
, -
, and *
in GF(2^128). I added a test for it in ExConv. But there are lots of place needed to change from block to F128. What should I do?
So lance roy is the one who added force inline. Not totally sure if there was a real need for it. Inline tells the compiler please inline but the compiler can still not inline while force tells the compiler to always do it. Typical if it's a good idea to inline then the compiler will do the right thing with just inline. Sometime not though.
If you are going to use force inline you should have a benchmark that shows it actually helps. In some cases it might hurt (the code becomes bigger).
For exconv, it's not clear to me what the issue is. In general you should replace block with F. But if I recall it's already templated so I'm confused about the question.
OK, that makes sense.
I am saying that there are other places, such as VOLE test, needs to replace block with F. Maybe do that in another PR as well?
leave the existing tests. Make new ones and adapt them lightly. So for the block ones, + and - should both be mapped to xor.
I added a new namespace osuCrypto::Subfield
and put all new copies in that namespace so the changes won't affect existing tests. The code is working now and you can take a look. Malicious security is not implemented yet and the optimization for noisy I mentioned would be another PR after this.
~i think you didn't commit your tests~
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_ExConvCode_encode_test(class osuCrypto::CLP const &)" (?Subfield_ExConvCode_encode_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_Tools_Pprf_test(class osuCrypto::CLP const &)" (?Subfield_Tools_Pprf_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_Noisy_Vole_test(class osuCrypto::CLP const &)" (?Subfield_Noisy_Vole_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_Silent_Vole_test(class osuCrypto::CLP const &)" (?Subfield_Silent_Vole_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\frontend\frontend_libOTe.exe : fatal error LNK1120: 4 unresolved externals
ninja: build stopped: subcommand failed.
nevermind.
pushed some changes to my subfield branch. Below are my comments
_mm_laod_ps
for types that aren't block
. This can fail if the thing being loaded isn't 16 byte aligned. I think in release build this code gets optimized away in these cases (the result isn't used) and so doesn't crash. But on debug linux builds I'm getting a crash. I fixed this by adding an if statement to guard the call.F128
is not portable. Like i mentioned before, you can not use __int128
. It is not standard c++. You need to use standard c++ or optionally enable platform-specific features behind #ifdef
macros. "cryptoTools/cryptoTools/..."
or "out/..."
. These will not work correctly when you install libOTe. F,G
types. ideally, we make no assumptions apart from the function that TypeTrait
s provides. Things like using sizeof(F)
, setting and reading bits using BitIterator
, assuming F,G
have member functions for +, *
.TypeTraint<F,G>
that just calls your preferred methods that assume that F,G
are POD types and have +,*
members. But in general, for example, it would be nice if the impl also worked if say F,G
are just pointers to their underlying values. Of, in the case of block
, we want +
to map to ^
. ExConv, Vole
with your new one. So things should be general & efficient enough to handle both cases with no additional overhead. For the general case, we can "hide" the impls for QuasiCyclic
and the other codes. That way, we only need a general impl for ExConv
.SubfieldExConvConfigure(...)
, the keyword static
(for a "free function") is not what you should use when you give the implementation in the header. Instead, use inline
. Both work but static
has other meanings. For class member functions, static
means something different and you are using it correctly.I see, I will change to use functions from TypeTrait
instead of using F
, G
directly. What is your configuration for indentation?
4 spaces. no single line if statement (can't place breakpoints).
I made changes to the TypeTraits and it works fine except for the encoder. I'm not sure if you want the encoder take the full TypeTrait(which contains two types) of VOLE as template, since we may want to use the encoder independently.
Also can you take a look at the SoftSpoken issue I mentioned above? It's still error in DEBUG mode(but is fine in release mode).
The encoder can take it as a template. Later we can revisit the decision if needed.
I tested soft spoken on my machine and it worked... What compiler are you using?
I changed the ExConv to use TypeTrait.
I am using gcc (Debian 12.2.0-14) 12.2.0
. Can you try -u 131
in DEBUG mode on my new commit?
ok, the issue was just with the assert. the code is fine.
do all the unit tests pass for you? seems there is an issue with silent vole. see the CI above
Yes, it's an error with dualEncode2, fixing it now.
It's quite strange. It seems commit 5132392 changed the result of accumulate
for blocks. I tried to fix it for an hour but was not successful, will try again later.
OK I figured it out. There is some issue with the block optimized accumulate. I commented out all std::is_same<block, T>::value
code and it's working now. I'm not sure what is the optimization doing and b[0].get<i32>
seems strange to me. I guess we can further improve this part of code?
Implements #122