VBFL is a Proof-of-Stake (PoS) blockchain-based Federated Learning framework with a validation scheme robust against the distorted local model updates. This repo hosts an simulated implementation for VBFL written in Python.
Please refer to Robust Blockchained Federated Learning with Model Validation and Proof-of-Stake Inspired Consensus for detailed explanations of the mechanisms. This video is about my talk of VBFL.
python 3.7.6
pytorch 1.4.0
(1) Clone the repo
$ git clone https://github.com/hanglearning/VBFL.git
(2) Create a new conda environment with python 3.7.6
$ conda create -n VBFL python=3.7.6
$ conda activate VBFL
(3) Head to https://pytorch.org/ for instructions to install PyTorch
$ # If using CUDA, first check its version
$ nvidia-smi
$ # Install the correct CUDA version for PyTorch.
$ # The code was tested on CUDA 10.1 and PyTorch 1.4.0
$ conda install pytorch=1.4.0 torchvision torchaudio cudatoolkit=10.1 -c pytorch
(4) Install pycryptodome 3.9.9 and matplotlib
$ conda install pycryptodome=3.9.9
$ conda install matplotlib
Sample running command
$ python main.py -nd 20 -max_ncomm 100 -ha 12,5,3 -aio 1 -pow 0 -ko 6 -nm 3 -vh 0.08 -cs 0 -B 10 -mn mnist_cnn -iid 0 -lr 0.01 -dtx 1
This command corresponds to VBFL_PoS_3/20_vh0.08 in the paper
VBFL arguments
(1) -nd 20: 20 devices.
(2) -max_ncomm 100: maximum 100 communication rounds.
(3) -ha 12,5,3: role assignment hard-assigned to 12 workers, 5 validators and 3 miners for each communication round. A in -ha means the corresponding number of roles are not limited. e.g., -ha *,5,* means at least 5 validators would be assigned in each communication round, and the rest of the devices are dynamically and randomly assigned to any role. -ha *,*,\ means the role-assigning in each communication round is completely dynamic and random.
(4) -aio 1: aio means "all in one network", namely, every device in the emulation has every other device in its peer list. This is to simulate that VBFL runs on a permissioned blockchain. If using -aio 0, the emulation will let a device (registrant) randomly register with another device (register) and copy the register's peer list.
(5) -pow 0: the argument of -pow specifies the proof-of-work difficulty. When using 0, VBFL runs with VBFL-PoS consensus to select the winning miner.
(6) -ko 6: this argument means a device is blacklisted after it is identified as malicious after 6 consecutive rounds as a worker.
(7) -nm 3: exactly 3 devices will be malicious nodes.
(8) -vh 0.08: validator-threshold is set to 0.08 for all communication rounds. This value may be adaptively learned by validators in a future version.
(9) -cs 0: as the emulation does not include mechanisms to disturb digital signature of the transactions, this argument turns off signature checking to speed up the execution.
Federated Learning arguments (inherited from https://github.com/WHDY/FedAvg)
(10) -B 10: batch size set to 10.
(11) -mn mnist_cnn: use mnist_cnn model. Another choice is mnist_2nn, or you may put your own network inside of Models.py and specify it.
(12) -iid 0: shard the training data set in Non-IID way.
(13) -lr 0.01: learning rate set to 0.01.
Other arguments
(14) -dtx 1: see Known Issue.
Please see main.py for other argument options.
While running, the program saves the emulation logs inside of the log/\<execution_time> folder. The logs are saved based on communication rounds. In the corresponded round folder, you may find the model accuracy evaluated by each device using the global model at the end of each communication round. You may also find each worker's local training accuracy, the validation-accuracy-difference value of each validator, and the final stake rewarded to each device in this communication round. Outside of the round folders, you may also find the malicious devices identification log.
The code for plotting the experimental results are provided in the plottings folder. The path of the desired log folder has to be specified for the plotting code to run. Please look at the code to determine the argument type or look at the samples in .vscode/launch.json.
The logs used for the figures inside of the paper can be found in plotting_logs.zip.
If you use a GPU with a ram less than 16GB, you may encounter the issue of CUDA out of memory. The reason causing this issue may be that the local model updates (i.e., neural network models) stored inside the blocks occupy the CUDA memory and cannot be automatically released because the memory taken in CUDA increases as the communication round progresses. A few solutions have been tried without luck.
A temporary solution is to specify -dtx 1. This argument lets the program delete the transactions stored inside of the last block to release the CUDA memory as much as possible. However, specifying -dtx 1 will also turn off the chain-resyncing functionality as the resyncing process requires devices to reperform global model updates based on the transactions stored inside of the resynced chain, which has empty transactions in each block. As a result, using GPU should only emulate the situation that VBFL runs in its most ideal situation, that is, every available transaction would be recorded inside of the block of each round, as specified by the default arguments.
The experimental results shown in the paper were obtained from Google Colab Pro with Nvidia Tesla V100, by which in most situations can run 100 communication rounds with 20 devices. If you wish to test a more complicated running environment, such as specifying a '--miner_acception_wait_time' to limit the validator-transaction accpetion time for miners, then each miner may end up with blocks having different validator-transactions and a forking event will require chain resyncing, then at this moment, please use CPU with a high ram. Fixing's underway.
Please raise other issues and concerns you found. Thank you!
(1)The code of the blockchain architecture and PoW consensus is inspired by Satwik's python_blockchain_app.
(2)The code of FedAvg used in VBFL is inspired by WHDY's FedAvg implementation.