Open lxj1999 opened 1 year ago
The batch size 2 is too big for my two 24G GPUs, it crush down when doing the 2400 iteration and needs to restart the computer.
The batch size 4 is too big for my two 24G GPUs, my input img is 512x512. when the batch size 8 , my server out of memory. How many pixels is the picture? @lxj1999
My image is 840 by 480
My image is 840 by 480 so your batch size 2 , the same as 512. out of memory! why is the bath size so small? @everyone
Batch size increased but speed decreased? why?
My image is 840 by 480 so your batch size 2 , the same as 512. out of memory! why is the bath size so small? @everyone
The library is over memory consumption, maybe some optimization issue and memory leakage.
Batch size increased but speed decreased? why?
The same issue, when I use batch size 2, although only takes up 15G out of 24G memory, the GPU is full-power and usage is 100%, not usual in other librarys.
8xb2 means batch size is 2 on 1gpu, need 8 gpus for training, so we need 8 gpus for multigpus training, oh ? memory consumption?
when I use 2 gpus, the speed is slower than 1gpu. Have you ever had this kind of problem?
@lxj1999 Training is performed by mmengine. Maybe it's where the memory is consumed. do you run the mmengine?
I am having this issue as well, specifically with DeepLabV3+ models.
Hi, guys. Does anyone solve this kind of problems. When I use detectron2 to train Deeplabv3+ with res101 backbone, I set batch size 8 with crop size (512,1024) for my 24GB gpu. But for mmsegmentation, it will run out of gpu memory if I use batch size 8. It seems mmsegmentation consum too much memory than other library.
Hi, did you solve this problem. I think I stuck in the same situation with you.
so try to use multi-GPU to solve based on linux
That might solve the out of memory error, but there's still the issue that their implementation of DeepLabV3+ uses too much memory in the first place.
That might solve the out of memory error, but there's still the issue that their implementation of DeepLabV3+ uses too much memory in the first place.
Yes, I think the problem is just on Deeplabv3+. I test a part of other model, it works well without any memory error. But maybe now there is nobody to solve this issue
这可能会解决内存不足的错误,但仍然存在一个问题,即他们的 DeepLabV3+ 实现首先使用了太多内存。
是的,我认为问题出在 Deeplabv3+ 上。我测试了其他模型的一部分,它运行良好,没有任何内存错误。但也许现在没有人能解决这个问题
Actually bad for other models too. I have tried unet and swin transform, both break down at 2 batch size for 48G gpus.
I thick this problem come from their basic structure, change for another available open-source library may be a better choice, do not stuck in this stupid memory issue.
When I use resnet101 with deeplabv3+ in this lib 'https://github.com/VainF/DeepLabV3Plus-Pytorch', I can train for 40 batch size on 2 GPUs, 20 batch size for each GPU, and it consumes about 36 GB. However, when I use your lib, for 2 batch size on 1 GPUs, it consumes about 15 GB