IPFS-orchestrated federated learning services

Description

Federated learning (FL) is a recently proposed ML paradigm that allows entities which store locally, potentially privacy-sensitive, data to train models collectively. The most prominent example is Google Keyboard that uses metadata from users’ typing to propose next words or to auto-correct typed words, while preserving users privacy.

In centralised FL, a server orchestrates the training process by storing, updating and broadcasting the model to the participating agents. It is easy to see that centralised FL has a single point of failure whose any unavailability will disrupt the training process. Moreover the server may become a bottleneck and deteriorate the trainning process. More importantly, depending on the network conditions and the resource availability centralized FL may have poor scalability performance. In contrast, in decentralised learning settings, single points of failure and resource bottleneck issues do not exist. Furthermore, decentralized FL scales well with the number of agents participating in the trainning process.

The goal of this project is to develop an InterPlanetary Learning System (IPLS), a fully decentralized federated learning framework that is partially based on the InterPlanetary File System (IPFS). By using IPLS and connecting into the corresponding private IPFS network, any party can initiate an ML training process of a model or join an ongoing training process that has already been started by another party.

Preliminary results demonstrate that (i) IPLS scales with the number of participants, (ii) is robust against intermittent connectivity and dynamic participant departures/arrivals, (iii) requires minimal resources, and (iv) guarantees that the accuracy of the trained model quickly converges to that of a centralized FL framework with an accuracy drop of less than 1‰.

State of the art

It is not hard to see that centralized federated learning has become a dominant research field in distributed machine learning. As a result, many existing frameworks for Federated learning are destined for centralized FL. Existing decentralized FL systems are mostly based on gossiping schemes while there exist approaches that are based on model aggregation using blockchain. We follow a different approach based on distributed shared memory. To be more specific, the model is partitioned and the partitions are distributed among the peers of the network. Each peer in the network is responsible for receiving updates (gradients) and sending the updates back (and as a result keep up to date), for some partition(s) that has been assigned. For more details read our relevant paper https://arxiv.org/pdf/2101.01901v1.pdf. This system model requires less communication bandwidth, but also by using a sychronous stochastic gradient decent (SGD) algorithm, the convergence and accuracy of the model is indentical as it would be in the centralized FL.

Solving this Open Problem

To this day, we have already developed a framework, and give an API with performing some basic task such as joinning and leaving to the system and also updating the model. The codebase is written in Java and we use the Java IPFS HTTP API. Though a lot of work must be done in order to provide a notable and robust tool for the ipfs users, developers, data scientists and applications.

Plans for the near future are:

We plan to consider multiple models with a wide range of hyperparameters and examine their training overhead and their performance and also try examine IPLS in terms of scalability and performance, and provide solutions in any possible problem we might face.
Develop a directory service using IPFS DHT, providing information about peers responsible for partitions, in order to enhance initialization stage of the system and availability. By doing that each peer can register with the keys the partitions that is responsible into the service for and also unregister when he is willing to leave from the network.
Exploit further IPFS pub/sub in order to give transfer learning capabilities. This is of high importance for improving model accuracy in projects with relatively small partitipation and also decrease computational cost in some devices.
In case of synchronous SGD, develop a data aggregation protocol and integrate it on pub/sub for replicas synchronization.
Run IPLS in real world devices and data in order to monitor the true performance of a decentralized federated learning scheme and compare it with the centralized approach.

Some future directions with interesting research and practical purposes are provided bellow:

Mobile Computing :
- Develop, primary, an android app which installs IPFS if it does not exists in user's smartphone, and make it possible for every user to download and join the project.
- Observe of IPFS pub/sub in mobile devices, energy consumption and how it treats problems existing in a mobile environment such as intermittent connectivity and long time disconnections.
- Investigate and develop IPLS in device-to-device and ad-hoc systems. This will be highly beneficial in 5G but especially 6G for offloading the bandwidth of Base stations and also extends the functionality of IPLS.
Edge/Fog Computing :
- Create a protocol in IPFS (if it does not arleady exists) for securerly sharing data between personal devices of a user. By doing that a user can share send his sensitive data having on his mobile phone with his or a trusted PC which will use them to train the model.
- Extend IPLS in order to provide Computation Offloading, Edge/Fog Computing capabilities, combining federated learning with split learning, tranforming in this way the IPFS decentralized network into a decentralized super computer.
Privacy/Security :
- Investigate privacy guarantees given the fact that the model is partitioned and distributed to different peers. Tailor and optimize the solutions of differential privacy and secure multiparty computation, given the privacy guarantees.
- Defend the system against byzantine behaviour such as adversial learning. This is very important especially in a pure p2p decentralized system.

What is the impact?

We envision a system for distributed machine learning, as IPFS (or Bittorrent) is for file sharing. We aim to create an environment powered by IPFS peer-to-peer system on which individuals can upload their project, the attributes of the data on which the model is going to be trainned, and also the model itself. Everyone interested on the project, can enter the IPFS private, download the most recent model, and cooperates with other peers in the way described in our paper. Note that projects can automatically be integrated in an application, (such as google board) , or can be also integrated in the IPFS itself for improving QoS and Security.

Other

Do you want to be part of this work?

This is part of an ongoing work. Feel free to contact Christodoulos Pappas (chrpappas@uth.gr) or Dimitris Chatzopoulos (dcab@cse.ust.hk) if you are interested.

protocol / research