Closed tau-yihouxiang closed 3 years ago
Hi, I suspect that it's due to low disk read (concurrent read) since in dataset getitem, the GPUs loads the same images:
https://github.com/kwea123/CasMVSNet_pl/blob/c94e7b00a6fd73df37117ddee1945fe99a43138d/datasets/dtu.py#L147-L193
Can you check the whole profile by pytorch-lightning? You might need to update the pytorch-lightning version and fix some of the code in train.py
.
感谢作者大佬的开源代码。 很抱歉打扰了。我找到这个旧的issue,因为遇到了相同的问题,将num_gpus设置为2,训练时间大约需要3倍。请问有解决方案了吗? @tau-yihouxiang @kwea123
你好,我在上面的回答是因為讀取圖片太慢了,如果要加快,你可能要重寫dataset,要嘛使用快速loader例如nvidia dali,或先把每張圖片png轉成tensor存到某個檔案裡,然後train的時候直接讀取這個檔案
Thank you for your reimplementation first! However, I found that the training speed of multi gpu mode is much slower than single gpu: single GPU: 1.01s / iter 2 GPUs: 3.84s / iter