Closed chrisway613 closed 8 months ago
Yes it doesn't affect the current configs I think, but I've made things cleaner and also added a seed requirement (instead of unsuccessfully trying to autogenerate one if not provided) here. Have a look.
But this made each rank have a different seed(not only worker seed inside dataloader), which will lead to the network of each rank to be initialized with different parameter values(we use random initializatoin by default in most cases). Even so, DDP can ensure that the initial network parameters of each rank are consistent. Personally, I prefer to make each rank only the worker seed inside dataloader different, not all the seeds. But.. it's all up to you, not a serious problem.
Yeah initialization is not an issue since DDP syncs the params and buffers after init. Thanks for flagging.
Hi, it's me again! I think there maybe a problem with dataloader reseeding workers in multi-gpus training, workers with the same
worker_id
in different gpus will get the same randomness if we use the way as below(as repo):https://github.com/nnaisense/bayesian-flow-networks/blob/896ea205debb4896b27a61e79e378b720a926309/utils_train.py#L60
https://github.com/nnaisense/bayesian-flow-networks/blob/896ea205debb4896b27a61e79e378b720a926309/utils_train.py#L67
One way to avoid this problem is to seed generator by the specified
seed
and therank
, and this may look like:Following this way, we don't even have to set
worker_init_fn
in dataloader, and different gpus will have different_base_seed
in their dataloaders, finally making them(each worker in each gpu) own their unique randomness.