Open Saipraneet opened 3 years ago
Thanks for filing this! I also think that this will be very useful.
A couple of clarifying questions:
What exactly constitutes a "standard numpy dataset"? An iterator of numpy arrays? A tf.data.Dataset? A single numpy array encapsulating the entire dataset (assuming it fits in memory)?
When you say "FedJax dataset", does this refer to fedjax.FederatedData?
I think if an iterator of numpy arrays is supported, that would be the most general. The tf.data.Dataset can be converted using as_numpy_iterator
.
does this refer to fedjax.FederatedData
yes. The goal would be to be able to use this dataset with the rest of the fedjax framework.
Hi, has any work been done for this issue? Is there still a need for it?
More generally, what is the state of this repo? Is it still active? Is there work that needs some contribution? I am more than happy to help.
Hi there. There hasn't been much work done for checking in a general implementation for this but it would be nice to have. We still actively use and maintain this repo and would be more than happy to have you contribute!
Hi, has any work been done for this issue? Is there still a need for it?
More generally, what is the state of this repo? Is it still active? Is there work that needs some contribution? I am more than happy to help.
Have you checked out InMemoryFederatedData? It should be sufficient for creating synthetic datasets in most cases.
Synthetic federated datasets can constructed from standard centralized ones by artificially splitting them among clients. This is usually done using a Dirichlet distribution (e.g. Hsu et al. 2019). Such synthetic datasets are very useful since we can explicitly control the total number of users, as well as the heterogeneity.
It would be great to have primitives which can automatically convert standard numpy dataset into a FedJax datset.