roytseng-tw / Detectron.pytorch

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.
MIT License
2.82k stars 567 forks source link

Index out of range on multi-GPU (8 gpus ) after first epoch #201

Open akshitac8 opened 5 years ago

akshitac8 commented 5 years ago

Expected results

Successful Training

Actual results

Detailed steps to reproduce

After Running the main and on completion of first epoch, I get an index out of range error with drop_last = False on

mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])

I tried to trace the error reason and came to know that after first epoch last 3 device ids i.e, 5,6,7 which is very weird behaviour. E.g.:

CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/train_net_step.py --dataset dota_patches --cfg configs/baselines/e2e_mask_rcnn_X-101-64x4d-FPN_2x.yaml --bs 8 --nw 8

System information