sacs-epfl / decentralizepy

A decentralized learning research framework
MIT License
24 stars 18 forks source link

Seed not used by some datasets #11

Closed dimiarbre closed 6 months ago

dimiarbre commented 6 months ago

Some datasets do not use the self.random_seed variable. This leads to an inconsistent datasets repartition in a single run, since a data element can be given to multiple nodes. https://github.com/sacs-epfl/decentralizepy/blob/170fc9eb1bf31d0559cf57917a311c3b575b1e16/src/decentralizepy/datasets/Femnist.py#L125 https://github.com/sacs-epfl/decentralizepy/blob/170fc9eb1bf31d0559cf57917a311c3b575b1e16/src/decentralizepy/datasets/Celeba.py#L125

Moreover, no seed is used when generating a validation set: https://github.com/sacs-epfl/decentralizepy/blob/170fc9eb1bf31d0559cf57917a311c3b575b1e16/src/decentralizepy/datasets/Femnist.py#L155-L157