snwagh / falcon-public

Implementation of protocols in Falcon
90 stars 46 forks source link

relu function in falcon is much bigger than in securenn #18

Open lijin456 opened 2 years ago

lijin456 commented 2 years ago

And it doesn't match the paper(falcon), which you mentioned that falcon improve the efficiency by abount 2X. Do you have any advice? Thanks every much

lijin456 commented 2 years ago

Should I give more details? I use the code in your repositories, and don't change the parameters. Could you help?

snwagh commented 2 years ago

It is indeed puzzling, it shouldn't be the case. Are you running them on the same machine/set of machines? Are the parallelism parameters such as this the same across the machines? If you use too many cores for a machine with small number of cores, it can cause a slowdown instead and that might be one reason? Another thing would be to check the control flow to ensure that you are running exactly what you think you are.

Also, the 2x efficiency is theoretical, I don't remember what the concrete numbers were but I would assume they should be 2x or more.

lijin456 commented 2 years ago

Thanks for your advices. That's alright, the no_cores in falcon and securenn is different. I run these code(seucrenn falcon) in three 8 vCPU 16 GiB Alibaba Cloud elastic compute service instances. Time decrease when I decrease the no_cores vlaue.

But I'm still confusing the relationship between the no_cores value in code with machine cpu cores number. In falcon, I change the no_cores from 8 to 4, the wall time change to 0.068s from 0.12s. But I change the no_cores in securenn from 8 to 4. the wall clock time change to 0.058s from 0.077s. The relu time in falcon is still greater. How I find the right no_cores numbers? Thanks again

snwagh commented 2 years ago

no_cores manually parallelizes the bottleneck part of the code. I haven't found an automated way to find the appropriate number of cores but usually little less than half the machine config is a good reference.

For instance, if you have 8 core machine with 16 threads, using fewer than 8 is a good idea, so something like 6 is a good number in my experience. Note that this is for runs over LAN/WAN. If you're running over localhost where all 3 executables (parties) are running on the same machine, then you want to use 6 cores split among all these parties so ideally I would set the no_cores to 2.

lijin456 commented 2 years ago
Sorry to bother you again. I'm confused at the parameters in falcon because it seems that falcon works worse than securenn. I had benchmarked some basic functions on 8 vCPU 16 GiB Alibaba Cloud elastic compute service instances. I only change the parameters no_cores=4 , these are the results(I call the functions 100 times): falcon securenn
wall clock cpu time wall clock cpu time
relu 0.0672796 sec 0.089296 sec 0.0579777 sec 0.006221 sec
drelu 0.0585398 sec 0.075817 sec 0.0551445 sec 0.0061 sec
select share 0.0110452 sec 0.013957 sec 0.00659282 sec 0.006437 sec
debugDotProd 0.00552658 sec 0.006838 sec 0.00623013 sec 0.006237 sec

here is how I call debug functions in main.cpp (securenn)

    if (!STANDALONE)
        initializeMPC();
    start_m();
    debugDotProd();
    end_m(whichNetwork);

here is how I call debug functions in main.cpp (falcon)

start_m();
runTest("Debug", "DotProd", network);
end_m(network);

example: I call the functions 100time in debug functions.

for(int i = 0; i < 100 ;i++) funcSelectShares(a, b, selection, size);

there are any other parameter that I should caution. Thanks very much!

snwagh commented 2 years ago

No worries, feel free to create issues if you are unable to resolve it. It is a little hard to know what exactly might be causing the issue without taking a look at the code but here are a few thoughts:

I would look into what code is being run (tracing the control flow). For instance, it seems that SecureNN does not really have a debug ReLU function whereas Falcon does have one (in the original repo). So you want to double check what you are comparing when you run ReLU on both codebases.

Secondly, if you're running the code as is from the repo, then the debug function is vectorized over size 8 for SecureNN but only size 5 for Falcon. Finally, any reconstruct function calls would also affect the performance.

lijin456 commented 2 years ago

Thanks for you advice. I alredy implemented relu debug function in securenn and changed the size to 10 in debug function in both repositories. I've pushed the code to github. I hope you can take a look at the code if you have time. Thank you very much. here is the code I changed in securenn and the code I changed in falcon

snwagh commented 2 years ago

I ran your above codes on localhost and could reproduce a similar error, the SecureNN timings are indeed a bit lower. The reason is unclear to me. :( I tired tweaking a few things here and there but the numbers don't agree with the theory. I can't think of a reason why this is happening.

imtiyazuddin commented 2 years ago

The ReLU function seems to be working only for integer values. When I tried to give floating point values all I was getting was zeroes. I also tried converting floating vals to myType and run but for output I am getting some big values (I think they are my plaintext vector values multiplied by 2^scalefactor mod 2^32). Please correct me If I am wrong

snwagh commented 2 years ago

The functions are meant to work on fixed point values (thus integers only). It is hard to say what is causing the issue but if you might be printing the plain integer values which explains why they're scaled. Try to use some of the print functions provided to get human readable output.

ZJG0 commented 2 years ago

I ran your above codes on localhost and could reproduce a similar error, the SecureNN timings are indeed a bit lower. The reason is unclear to me. :( I tired tweaking a few things here and there but the numbers don't agree with the theory. I can't think of a reason why this is happening.

This result is inconsistent with the conclusion in the paper. How to explain this problem or how to solve this problem?

snwagh commented 2 years ago

@ZJG0 Are you also having the same issue?

I am not very sure how to debug this issue. Maybe there's another library out there that implements both protocols that could be another comparison point.