snwagh / falcon-public

Implementation of protocols in Falcon
90 stars 46 forks source link

Configuring the network of linux server #33

Closed HuangPZ closed 2 years ago

HuangPZ commented 2 years ago

Hi, I was trying to conduct the experiment on Linux servers, but I could not get the results from the paper. For the smaller networks, the computation time is almost ignorable, and the main overhead comes from the transmission. For example, for a network like Sarda, in the WAN setting of 70ms ping time between servers and 40MB/s speed, I spent 2.16s purely in data transmission during inference (counted through adding timers in sendVector and receiveVector, computation cost < 0.05s), while the paper reports only 0.76s in total. One thing I notice is that in my case the time cannot simply match adding the round delay + transmission time, and this is very likely to be caused by the vector being split into several packages and costed several rounds during transmission due to the transmission protocol settings.

Did your experiment specifically set some parameters or rules for the Linux data transmission to perform better in large packages? Would you mind sharing those settings so I can better reproduce the results you have? Of course, if you have some other theory for the difference, please let me know! I'm not using an Amazon server as in the paper, would that potentially caused this kind of huge difference?

Thanks!

snwagh commented 2 years ago

It could be the result of the amortization w.r.t the batch. Can you try the batch size provided in the paper and then check the per image inference?

If that's not agreeing, could you paste the output of your runs? And can you confirm you used the exact same set-up as in the paper? Network being the dominant cost is expected behaviour in the WAN setting.

HuangPZ commented 2 years ago

As I see in the paper the batch size is 128. So you mean put in a 128 batch then divide the inference time by 128 right?

I do think the set-up is the same. I'll check back with you after confirming.

snwagh commented 2 years ago

Yes.