osu-crypto / libOTe

A fast, portable, and easy to use Oblivious Transfer Library
Other
428 stars 107 forks source link

Subfield VOLE #127

Closed lzjluzijie closed 7 months ago

lzjluzijie commented 11 months ago

Implements #122

lzjluzijie commented 11 months ago

I made the test pass with interleaved mode and "field" u128. I didn't find convenient prime field implementation so I kept using u128, I think it should be similar.

As for the current code, I split the getLevel to intermediate levels and the last level, and some other changes as well. The code is still WIP and very messy and not ready for review yet. I will do some clean up after everything is done.

@ladnir Could you take a look at the current PPRF implementation and the subfield test if you have time(mainly if my tests make sense)?

For the coding part, I felt it would take some time to change LinearCode or QuasiCyclicCode to work with any template field. ~I think I can try QuasiCyclicCode with block replaced by u128, since there are the same size, to verify the idea first?~

ladnir commented 11 months ago

do not use QuasiCyclicCode. We can just use ExConv. When you are ready I can give some pointers on how.

ladnir commented 11 months ago

As mentioned in the comments, It best to thing of F,G as some arbitrary type that you can assume nothing about. Whenever you need some functionality out of them, you should achieve this by a function call.

I would suggest changing the type level template to be some FieldTraits which has subtypes FieldTraits::F,FieldTraits::G. Whenever you want some functionality, you can call the function FieldTraits::functionality(...).

lzjluzijie commented 11 months ago

Thank you for your review, for ExConv, I should read https://eprint.iacr.org/2023/882 ?

ladnir commented 11 months ago

yeah, likely would be helpful. You don't need to understand the proof but the basic idea. So the intro and the start of section 5

lzjluzijie commented 11 months ago

I wonder if we should define some virtual class Field that supports operator+, operator*, and fromBlock? I am not expert in cpp. I would implement this "interface" style if in other languages, but is this a good choice in cpp(might be slow than template)?

ladnir commented 11 months ago

virtual would be the natural way of doing it but it has very bad performance.

The cost of a function call is orders of magnitude higher than say +. C++ often allows you to avoid this cost by having the compiler inline the function, essentially copy pastes the function body into the call sight so there is no function call. However, this can not work for virtual functions because the compiler does not know which "virtual" instances is currently being used.

You can still achieve this same type of "virtual" behavior with templates. In fact, templates are basically virtual functions where you are additionally telling the compiler exactly which type is being used.

So concretely, my suggestion is the following:

make sense?

An example of this pattern can be found here. https://github.com/Visa-Research/volepsi/blob/main/volePSI/PxUtil.h#L550 In this case I called it Helper instead of TypeTraits and it has some other abstractions. But the same basic idea. It was used to allow the caller to call the OKVS for VOLE-PSI with basically any type as long as they provided the Helper. In particular, they could even call it when the "element", in out case F was some variable vector.

lzjluzijie commented 11 months ago

Update: I fixed some bugs in PPRF, and the u128 silent VOLE is working if comment out the final coding part. The code part is still not working, and I will continue working on it tomorrow. The code seems a little complicated, ~but I think it might work by changing some ^ to +~. No need to review code at this time, but you could take a look at ExConvCode_encode_u128_test if my test make sense if you have time. Thank you.

I realized that in the ExConvCode_encode_basic_test test there exists code.dualEncode2<block, u8>(cc1, cc2);. Could you explain why there are dualEncode2 (for performance)? I looks like that code is already well templated, but my current test does not work. Does it make sense to change ^ to +?

Btw the ExConvCode_encode_basic_test got segmentation fault on my end when build mode is DEBUG, but was OK in RELEASE mode. I will look into it later.

ladnir commented 11 months ago

That's right, basically just need to use plus. Encode2 is for performance. It just encodes twice. You can just call the single in code twice if you want.

lzjluzijie commented 11 months ago

I got silent VOLE test passed with F=G=u128. The ExConvCode is indeed well templated and I only changed a few ^ to +. I will test it with other fields later.

As for make this more generic, I wonder why the current block does not support operator*, and operator+ seems adding left and right parts independently as u64. Shall we change it to behave like GF(2^128) elements, or we define some class F128: block and over write the operators?

ladnir commented 11 months ago

block is defined that way due to its primary purpose being an abstract on the __m128i built in type. This type models special cpu instructions (aka intriniscs). It does not have 128 bit addition or multiple, so block doesn't. There is 2x 64bit addition as you mention. Maybe I shouldn't have overloaded operator + since it's a bit confusing. But I did so oh well ha.

ladnir commented 11 months ago

As suggested above, we shouldn't even use operator+ or operator^. Instead there should be a type traits object that handles all the operations for you. You simply passing in the arguments. But this can happen at then end once the basic version is working

lzjluzijie commented 11 months ago

So we should do something like

struct TypeTrait
{
    using F = u128;
    using G = u8;
    ...
    static inline F Add(const F& a, const F& b)
    ...
};

and I was suggesting


struct F128: block {
  inline F128 operator+(const F128& rhs) const {
    ...
  }
};

struct TypeTrait
{
    using F = F128;
    using G = F128;
    ...
};

(not using virtual). Is there performance difference?

It seems there is not much difference, but I agree it's safe to use a TypeTrait with explict Add functions.

ladnir commented 11 months ago

Yes, it will be about 10x faster. Maybe in some cases the compiler might be smart enough to remove this overhead but best not to relay on the compiler that way.

ladnir commented 11 months ago

In simples example the compiler does this thing called devirtualization which basically can inline virtually functions. But there's no guarantee that it will happen. However, if you use inline functions, you are basically guaranteed there's no cost. The function will disappear.

lzjluzijie commented 11 months ago

I see, I will change the code. And for ExConv, I should also modify it to use inline member function, not operator + or ^?

lzjluzijie commented 11 months ago

One more question please: In my second example, F128 inherits block, but the operator function is also inline without virtual. Is this kind of inline different from the member function inline in the first example? If so, is it because of F128 inherits block?

ladnir commented 11 months ago

Ok, i took a closer look at you example. Inheritance != virtual. You can inherit all you want. The functions can still be inline and they would behave the same as if you wrote new version of them. What has a performance overhead is when you use the virtual keyword on a function and then override it in some base class. Specifically, consider the example

struct base {
  virtual void foo() = 0;
};
struct derived0 : base {
void foo() override { std::cout << "derived0 << std::endl; }
};
struct derived1 : base {
void foo() override { std::cout << "derived1 << std::endl; }
};

When someone has a base* v; and they call v->foo(); what happens is that at the begining of derived0, derived1, there is a hidden pointer that points to something that called a v-table. This v-Table looks like:

struct vTableForDerived0{
      void(* baseFooFunctionPointer)(void* derived0);
};
struct vTableForDerived1{
      void(* baseFooFunctionPointer)(void* derived1);
};

and the derived classes actually look like

struct derived0 : base {
void* pointerToVTable;
void fooImpl() { std::cout << "derived0 << std::endl; }
};
struct derived1 : base {
void* pointerToVTable;
void fooImpl() { std::cout << "derived1 << std::endl; }
};

Now the v->foo(); call gets transformed into

auto* vTable = v->pointerToVTable;
vTable->baseFooFunctionPointer(v); // this calls derived0::fooImpl() or derived1::fooImple() 

so now it might be a bit clearer why doing virtual is a bit of a performance hit. You have to chase the pointer to the vtable and then you have to call baseFooFunctionPointer. So say instead of printing something we were just doing a + on a u64. Then you are doing all this, plus the function call which itself has to do like a bunch of stuff (flush variables our of registers, update the program counter, make space for a new stack frame, execute the call instruction, and then you do a simple +.

In some cases the compiler is smart and can look at your program and say "hay, even though v is a Base*, i know it's actually derived0 and therefore i can just call derived::fooImpl. I will even inline it". However, if there are many possible derived types and the compiler can't see which is the type being used, then it can't do this devirtualization.

In your example above you don't use virtual and therefore none of this stuff happens. They are just normal function calls. As you can see in your assembly, these get inlined (no function calls made).


Correct, you can define add like you did above and use that wherever you want to do a plus. Additionally, I would suggest not requiring that the TypeTrait object only have a static function. So these subfield protocols will take the normal F,G values an input and an instance of TypeTraits. You use this instance to do all the additions and everything else.

Consider the case where you want to allow the modulus to be many different values. If add is static, then this modulus must be somewhere within the F type. e.g.

struct F {
  u64 val;
  u64 modulus;
};

So now all Fs have to be twice as big. However, if TypeTraits is an actual object, you can simply stick the modulus in it. Now F is only a u64 in the example above.

The alternative would be for F to have modulus hardcoded into the "type". But this is a bit less flexible. Although maybe its good enough haha.

lzjluzijie commented 11 months ago

Thank you very much for your explanation. I understand that virtual is slow. Now we should have two approaches

The first is use operators with field elements(no virtual), for example

struct F128 {
    block b;

    inline F128 operator*(const F128& lhs, const F128& rhs) const {
        block r = lhs.b.gf128Mul(rhs.b);
        return F128(r);
    }
};

struct TypeTrait
{
    using F = F128;
    using G = F128;
    ...
   // no explict Add needed.
};

The second approach:

struct TypeTrait
{
    using F = block;
    using G = block;
    ...
    inline F Add(const F& a, const F& b)
    ...
};

// And protocols use member functions like Add, not operators

If my understanding is correct: both approaches should be similar performance, given there is no virtual and inlines are used. I think one advantage of the first approach that it needs less code. If we want to use primitive type for these protocol, we don't need to write Add functions in the second approach. However, the second approach might be safer. In the first case, users might have some unsafe pointer operation that make F* to block* and use wrong operator function. But I think inside libOTe we can make sure this never happen and it should be fine?

As for the modulus, I suggest that we can make F templated with modulus, and I think this is more convenient(in the first approach).

So I personally prefers the first approach. Again I don't know much about cpp, is there any issue with it? Thank you.

ladnir commented 11 months ago

Sure, you can do the first if you want. I might change it later though. Regarding the user, you can give a default version of TypeTrait to assumes F, G are normal types with +, etc.

lzjluzijie commented 11 months ago

Makes sense. So you think the second is better? I can do that directly.

ladnir commented 11 months ago

I'd prefer the second because you are separating the data (F, G) from the operations (add, sample, etc). You can use the same datatype (eg block) for both a 128 bit prime field and GF128. But it's fine either way.

lzjluzijie commented 11 months ago

Makes sense. I will do more tests with larger fields and then do it.

lzjluzijie commented 11 months ago

I got some issue with genSilentBaseOts. If I configure it with SilentBaseType::Base and run BaseOT, it works normally. However, if I use SilentBaseType::BaseExtend with NoisyVOLE requiring 64 not 128 OTs, assertion in libOTe/TwoChooseOne/SoftSpokenOT/SoftSpokenMalOtExt.cpp:56 would fail.

I'm not familiar with SoftSpokenOT and I wonder if there is any special reason to use it, not something simpler like IKNP?

lzjluzijie commented 11 months ago

I think we can do silent VOLE over general commutative rings, and current PPRF and ExConv code should support it. As for Noisy VOLE, the current method seems not working.

I have an idea: for G=u32 and F=u32^k, we can do k times of u32-u32 VOLE with the kth element of the type F delta from sender and same y array from receiver. The complexity should be similar, since we are still iterating every bit of delta once.

ladnir commented 11 months ago

For Semi-honest iknp would be fine. For simplicity I use soft-spoken. It sends a bit less data as well...

Where is it failing exactly? I suspect is soft spoken is failing then iknp would too. This have the same api. Hard to say though.

For silent vole that's sounds right. For noisy vole, you're saying G=gf(32) and F=gf(k*32) doesn't work? The issue isn't clear to me.

lzjluzijie commented 11 months ago

SoftSpokenMalOtExt.cpp:56 would fail. I am saying G is uint32, F is array of uint32, not fields.

ladnir commented 11 months ago

So the receiver has a in G^n, the sender has X in F.

The receiver computes vi = a * F(2^i). The user can define * however they want, more or less. The receiver acts as OT sender

mi0 = Ri
mi1 = Ri + vi

where `Ri in F^n is random. The sender learns

mi_{Xi} = Ri + Xi *  F(2^i) * a

When you sum the received messages you get

B = sum_i mi_{Xi} 
  = sum_i Ri + Xi *  F(2^i) * a
  = sum_i Ri + sum_i Xi *  F(2^i) * a
  = C + X * a

where C = sum_i Ri.

You want to define G as the Z_{2^32} group and F as the vector space over it. This is fine. It should work.

ladnir commented 11 months ago

For softspoken,im not sure. Seem not right. What command/test produces the error. I can run it.

lzjluzijie commented 10 months ago

Sorry was very busy, now back to this.

I will do more testing with larger "fields" and clean up code later.

ladnir commented 10 months ago

Yeah, change the code however makes sense. The instantiation thing was to improve compile times but I don't think it makes sense anymore. I can look into soft spoken

lzjluzijie commented 10 months ago

There is already lots of changes in this PR. I think I should put that Noisy VOLE optimization in a separate PR.

Also I noticed that, many functions in AES.h like finalEnc, roundEnc are not force inlined, and are indeed not inlined when I test it(at least in DEBUG mode). I can write another PR for this, and fix the header issues in cryptoTools.

ladnir commented 10 months ago

Do not use debug mode to make decisions about what being inlined. You'd need to use release. In debug builds the compiler often prefers not to inline because it can make debugging harder.

Sure, multiple PRs is good if that's what you prefer. Either way is fine with me.

lzjluzijie commented 10 months ago

Makes sense, but is there any reason to use the normal inline, not OC_FORCEINLINE in AES.h?

I added a F128 type which is essentially a block, but with the proper +, -, and * in GF(2^128). I added a test for it in ExConv. But there are lots of place needed to change from block to F128. What should I do?

ladnir commented 10 months ago

So lance roy is the one who added force inline. Not totally sure if there was a real need for it. Inline tells the compiler please inline but the compiler can still not inline while force tells the compiler to always do it. Typical if it's a good idea to inline then the compiler will do the right thing with just inline. Sometime not though.

If you are going to use force inline you should have a benchmark that shows it actually helps. In some cases it might hurt (the code becomes bigger).

For exconv, it's not clear to me what the issue is. In general you should replace block with F. But if I recall it's already templated so I'm confused about the question.

lzjluzijie commented 10 months ago

OK, that makes sense.

I am saying that there are other places, such as VOLE test, needs to replace block with F. Maybe do that in another PR as well?

ladnir commented 10 months ago

leave the existing tests. Make new ones and adapt them lightly. So for the block ones, + and - should both be mapped to xor.

lzjluzijie commented 10 months ago

I added a new namespace osuCrypto::Subfield and put all new copies in that namespace so the changes won't affect existing tests. The code is working now and you can take a look. Malicious security is not implemented yet and the optimization for noisy I mentioned would be another PR after this.

ladnir commented 10 months ago

~i think you didn't commit your tests~

C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_ExConvCode_encode_test(class osuCrypto::CLP const &)" (?Subfield_ExConvCode_encode_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_Tools_Pprf_test(class osuCrypto::CLP const &)" (?Subfield_Tools_Pprf_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_Noisy_Vole_test(class osuCrypto::CLP const &)" (?Subfield_Noisy_Vole_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\libOTe_Tests.lib(UnitTests.cpp.obj) : error LNK2001: unresolved external symbol "void __cdecl osuCrypto::Subfield::Subfield_Silent_Vole_test(class osuCrypto::CLP const &)" (?Subfield_Silent_Vole_test@Subfield@osuCrypto@@YAXAEBVCLP@2@@Z)
C:\Users\peter\repo\libOTe\out\build\x64-Debug\libOTe\frontend\frontend_libOTe.exe : fatal error LNK1120: 4 unresolved externals
  ninja: build stopped: subcommand failed.

nevermind.

ladnir commented 10 months ago

pushed some changes to my subfield branch. Below are my comments

lzjluzijie commented 10 months ago

I see, I will change to use functions from TypeTrait instead of using F, G directly. What is your configuration for indentation?

ladnir commented 10 months ago

4 spaces. no single line if statement (can't place breakpoints).

lzjluzijie commented 10 months ago

I made changes to the TypeTraits and it works fine except for the encoder. I'm not sure if you want the encoder take the full TypeTrait(which contains two types) of VOLE as template, since we may want to use the encoder independently.

Also can you take a look at the SoftSpoken issue I mentioned above? It's still error in DEBUG mode(but is fine in release mode).

ladnir commented 10 months ago

The encoder can take it as a template. Later we can revisit the decision if needed.

I tested soft spoken on my machine and it worked... What compiler are you using?

lzjluzijie commented 10 months ago

I changed the ExConv to use TypeTrait.

I am using gcc (Debian 12.2.0-14) 12.2.0. Can you try -u 131 in DEBUG mode on my new commit?

ladnir commented 9 months ago

ok, the issue was just with the assert. the code is fine.

ladnir commented 9 months ago

do all the unit tests pass for you? seems there is an issue with silent vole. see the CI above

lzjluzijie commented 9 months ago

Yes, it's an error with dualEncode2, fixing it now.

lzjluzijie commented 9 months ago

It's quite strange. It seems commit 5132392 changed the result of accumulate for blocks. I tried to fix it for an hour but was not successful, will try again later.

lzjluzijie commented 9 months ago

OK I figured it out. There is some issue with the block optimized accumulate. I commented out all std::is_same<block, T>::value code and it's working now. I'm not sure what is the optimization doing and b[0].get<i32> seems strange to me. I guess we can further improve this part of code?