google / fedjax

FedJAX is a JAX-based open source library for Federated Learning simulations that emphasizes ease-of-use in research.
Apache License 2.0
251 stars 41 forks source link

Feature request: Convert standard dataset into a federated dataset #206

Open Saipraneet opened 3 years ago

Saipraneet commented 3 years ago

Synthetic federated datasets can constructed from standard centralized ones by artificially splitting them among clients. This is usually done using a Dirichlet distribution (e.g. Hsu et al. 2019). Such synthetic datasets are very useful since we can explicitly control the total number of users, as well as the heterogeneity.

It would be great to have primitives which can automatically convert standard numpy dataset into a FedJax datset.

jaehunro commented 3 years ago

Thanks for filing this! I also think that this will be very useful.

A couple of clarifying questions:

Saipraneet commented 3 years ago

I think if an iterator of numpy arrays is supported, that would be the most general. The tf.data.Dataset can be converted using as_numpy_iterator.

does this refer to fedjax.FederatedData

yes. The goal would be to be able to use this dataset with the rest of the fedjax framework.

BaselOmari commented 1 year ago

Hi, has any work been done for this issue? Is there still a need for it?

More generally, what is the state of this repo? Is it still active? Is there work that needs some contribution? I am more than happy to help.

jaehunro commented 1 year ago

Hi there. There hasn't been much work done for checking in a general implementation for this but it would be nice to have. We still actively use and maintain this repo and would be more than happy to have you contribute!

kho commented 1 year ago

Hi, has any work been done for this issue? Is there still a need for it?

More generally, what is the state of this repo? Is it still active? Is there work that needs some contribution? I am more than happy to help.

Have you checked out InMemoryFederatedData? It should be sufficient for creating synthetic datasets in most cases.