Some documents may need to be updated

DylanWangWQF commented 2 years ago

Hi @jlwatson I tried to run piranha locally on 3 GPUs and build it in the docker container.

In Makefile, CUDA_VERSION=11.5 while the installed version is specified as11.6 in Dockerfile. I guess here needs the synchronization in piranha (I just changed the version in Makefileto avoid make error).

After solving several configuration issues, I can run piranha locally now. I guess some commands in the Dockfile should be updated (e.g., download the dataset for training, etc..). Here is the running log:

root@a5cb7eceb1f6:/piranha# ./localhost_runner.sh 
run unit tests? false
config network: "files/models/secureml-norelu.json"
network filename: files/models/secureml-norelu.json
----------------------------------------------
(1) FC Layer          784 x 128
          512        (Batch Size)
----------------------------------------------
(2) ReLU Layer        512 x 128
----------------------------------------------
(3) FC Layer          128 x 128
          512        (Batch Size)
----------------------------------------------
(4) ReLU Layer        512 x 128
----------------------------------------------
(5) FC Layer          128 x 10
          512        (Batch Size)
Error opening training data file at files/MNIST/train_data
Error opening training label file at files/MNIST/train_labels
Error opening test data file at files/MNIST/test_data
Error opening test label file at files/MNIST/test_label
TRAINING, EPOCHS = 10 ITERATIONS = 117
epoch,0
total time (s),63.125727
total tx comm (MB),2339.085938
total rx comm (MB),2339.085938
train accuracy,0.000000
epoch,1
total time (s),126.424196
total tx comm (MB),4678.171875
total rx comm (MB),4678.171875
train accuracy,0.000000

Here are some logs in my machine if someone wants to check:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
+-------------------------------+----------------------+----------------------+

|   1  Tesla V100-DGXS...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   38C    P0    53W / 300W |    371MiB / 32768MiB |     19%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-DGXS...  On   | 00000000:0E:00.0 Off |                    0 |
| N/A   38C    P0    52W / 300W |    347MiB / 32768MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-DGXS...  On   | 00000000:0F:00.0 Off |                    0 |
| N/A   39C    P0    54W / 300W |    339MiB / 32768MiB |     19%      Default |
+-------------------------------+----------------------+----------------------+

| Processes:                                                                                                             |
|  GPU   GI   CI        PID   Type   Process name       GPU Memory Usage|
|=================================================|
|    1   N/A  N/A      9249      C   ./piranha                         335MiB |
|    2   N/A  N/A      9250      C   ./piranha                         335MiB |
|    3   N/A  N/A      9252      C   ./piranha                         335MiB |
+-----------------------------------------------------------------------------+

DylanWangWQF commented 2 years ago

BTW, if we run piranha locally, we may need to add extra functions to simulate LAN and WAN environments. For me, I want to build my experiment based on piranha, so I need to collect the result under both LAN and WAN.

jlwatson commented 2 years ago

Hey @DylanWangWQF, coming back from out of town so apologies for the delay. I used Docker during some of my evaluation a while back, but haven't updated the Dockerfile for the current setup. Taking a look, since the Dockerfile copies your local files/ and scripts/ directories into the resulting image, you need to download the right dataset (e.g. MNIST) locally first.

That missing dataset might be why your local run above is not actually training.

jlwatson commented 2 years ago

As for LAN/WAN latency simulation, I wonder if it's a good idea to integrate that into Piranha or add that as a separate software layer somewhere else. In any case, I think it'll be hard to get the same performance running all parties locally because they're all contending for the PCIe bus to talk to their GPUs.

DylanWangWQF commented 2 years ago

Hey @DylanWangWQF, coming back from out of town so apologies for the delay. I used Docker during some of my evaluation a while back, but haven't updated the Dockerfile for the current setup. Taking a look, since the Dockerfile copies your local files/ and scripts/ directories into the resulting image, you need to download the right dataset (e.g. MNIST) locally first.

That missing dataset might be why your local run above is not actually training.

Yep! I have already tested it successfully.

DylanWangWQF commented 2 years ago

As for LAN/WAN latency simulation, I wonder if it's a good idea to integrate that into Piranha or add that as a separate software layer somewhere else. In any case, I think it'll be hard to get the same performance running all parties locally because they're all contending for the PCIe bus to talk to their GPUs.

Yes, if I run locally on a single machine, maybe I have to test how much it affects the overall performance of my exp. Thanks!

BTW, does piranha implement the functionality of the equality test in 3PC?

ucbrise / piranha

Some documents may need to be updated #2