i have 8 gpu,but i run python3 -m tools.train_net --num-gpus 8 \
--config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_FPN_base1.yaml
ERROR [05/07 16:48:25 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 138, in train
self.run_step()
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 232, in run_step
loss_dict = self.model(data)
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, kwargs)
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], *kwargs[0])
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/github/few-shot-object-detection-master/fsdet/modeling/meta_arch/rcnn.py", line 115, in forward
images, features, gt_instances
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 430, in forward
gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances)
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(args, **kwargs)
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 313, in label_and_sample_anchors
gt_labels_i = self._subsample_labels(gt_labels_i)
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 258, in _subsample_labels
label, self.batch_size_per_image, self.positive_fraction, 0
File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/sampling.py", line 50, in subsample_labels
perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal
i have 8 gpu,but i run python3 -m tools.train_net --num-gpus 8 \ --config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_FPN_base1.yaml
ERROR [05/07 16:48:25 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 138, in train self.run_step() File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 232, in run_step loss_dict = self.model(data) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 705, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/github/few-shot-object-detection-master/fsdet/modeling/meta_arch/rcnn.py", line 115, in forward images, features, gt_instances File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 430, in forward gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 313, in label_and_sample_anchors gt_labels_i = self._subsample_labels(gt_labels_i) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 258, in _subsample_labels label, self.batch_size_per_image, self.positive_fraction, 0 File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/sampling.py", line 50, in subsample_labels perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg] RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal
Do you have any suggestions?thanks