zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.21k stars 1.27k forks source link

gpu util issue, dataloader killed on multiple gpu training #429

Open SeongwoongCho opened 4 years ago

SeongwoongCho commented 4 years ago

i run my codes on [RTX2080ti*6] system

  1. when i train on my custom dataset with a single gpu, gpu utils(30~40%) are low.(even when i set pin_memory = True) my training command is below. How can i increase my gpu usage?

python train.py -c 0 -p mydata -n 16 --data_path ../datasets/ --batch_size 64 --lr 1e-5 --num_epochs 20 --load_weights ./weights/efficientdet-d0.pth --head_only True --optim sgd

  1. when i train the same environment except for only gpu numbers, Dataloader unexpectedly killed. I even did not change any parameters on the above command. below is my error code. what happened in my code?

Traceback (most recent call last): File "/home/jovyan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/usr/lib/python3.6/queue.py", line 173, in get self.not_empty.wait(remaining) File "/usr/lib/python3.6/threading.py", line 299, in wait gotit = waiter.acquire(True, timeout) File "/home/jovyan/.local/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 14293) is killed by signal: Killed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 333, in train(opt) File "train.py", line 218, in train for iter, data in enumerate(progress_bar): File "/home/jovyan/.local/lib/python3.6/site-packages/tqdm/std.py", line 1129, in iter for obj in iterable: File "/home/jovyan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/jovyan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data idx, data = self._get_data() File "/home/jovyan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 798, in _get_data success, data = self._try_get_data() File "/home/jovyan/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 774, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) RuntimeError: DataLoader worker (pid(s) 14293) exited unexpectedly

  1. do you have any plans for updating mixed precision training and dataloader prefetcher?
SeongwoongCho commented 4 years ago

my cpu info is 5 vCPU / Mem 32Gb

when i write lscpu on the terminal,

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz Stepping: 7 CPU MHz: 1000.649 CPU max MHz: 3200.0000 CPU min MHz: 1000.0000 BogoMIPS: 4400.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 14080K NUMA node0 CPU(s): 0-9,20-29 NUMA node1 CPU(s): 10-19,30-39

zylo117 commented 4 years ago
  1. You can't, current nvidia gpu is not that optimized for group conv models like effdet on pytorch.
  2. try reducing num_workers, I think it's oom.
  3. no, I'm kind of busy for now. And mix-precision is not good for effdet. Try running on float16, you will find the the result will be so bad but yolo and rcnn won't have this kind of issue. My guess is that effdet is sensitive with small values, so a little bit of precision loss with cause great degression on mAP.
SeongwoongCho commented 4 years ago

@zylo117 i think there would be shared memory leak somewhere on your code. my shared memory size is 126GB . and my custom data set's annotation size is 1.4GB,0.6GB for train,valid set

num gpu = 1 workers = 8 ok num gpu = 2 workers =8 ok (but slow than num gpu = 1, low gpu utils) num gpu = 2 workers >=16 fail num gpu >= 3 workers =8 fail

zylo117 commented 4 years ago

No, not on mine. It's a common issue of dataloader of pytorch. For now, I would suggest using less num_worker. And I'm not sure how you can set a shared mem to 126G when you have only 32G physical memory. But in the end, you can only utilize at most 32G.

MHansy commented 4 years ago

I am facing similar problem.....! Unfortunately I dont know where /how to set that number of workers=0?

kindly help me how to reach there? I am using Colab.

zylo117 commented 4 years ago

train with -n 0

MHansy commented 4 years ago

train with -n 0

Thanks for your reply.

I am Training GRAY SCALE IMAGE, do I need tomake any changes in mean and std at project file (-project name-.yml) as shown here below?

mean and std in RGB order, actually this part should remain unchanged as long as your dataset is similar to coco.

mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225]

zylo117 commented 4 years ago

try setting mean and std to [0.5, 0.5, 0.5]