Open liuxiaozhu01 opened 1 year ago
I've observed the same thing. Despite using multiple GPUs for training, the training time is the same with n times rays.
Inspecting the code, I wasn't able to see any call to a DistributedSampler for DDP, such as on Pytorch example.
I believe that this should be on each setup_train()
function of DataManagers, such as this one from VanillaDataManager.
This was also reported before, on issue #1082 and #2475. Also, the DDP was mentioned in other issues, such as #1847. Maybe we can get more hints about it.
@jb-ye do you known if this is the expected behaviour?
@alancneves I observed the issue and sent a fix for that earlier. https://github.com/nerfstudio-project/nerfstudio/pull/1856 Not sure if this is the same issue for you.
Hi @jb-ye! I've seen this issue, that address the averaging of the gradient across multiple GPUs.
However, I believe that the problem is more related to data loading. Without the sampler parameter on Dataloaders, in my understanding, all GPUs will load and train the same data, not a subset for each as intended.
@alancneves I see your point. I thought at least for Nerf Training, each gpu will sample different rays so essentially they are training with different data, no?
Yeah, it makes sense.... Even using the same images, we can sample different parts of them. Thanks for the clarification @jb-ye !
Just another question (related): We cannot guarantee that the N bundle of rays will be completely distinct from each other by using N GPUs. Can we?
We cannot guarantee that the N bundle of rays will be completely distinct from each other by using N GPUs. Can we?
No, but since N << number of pixels, the chance is low.
I want to utilize the
VanillaDataManager
(nerfstudio-0.2.2) as a component of my work. Now single gpu training works well, and im try to make it in multi-gpu for parallel training. However, im not sure whether i've done it in a right way. Here is a part of my code below.The cmd to execute is
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node 4 --nnodes=1 --node_rank=0 train.py
By the way, i notice that
class DataManager
has attributstrain_sampler
andeval_sampler
, but never used. Are they not useful for multi-GPU parallel training? Anyone can help? Waiting online