ucbdrive / few-shot-object-detection

Implementations of few-shot object detection benchmarks
Apache License 2.0
1.08k stars 225 forks source link

error gpu_num #113

Closed cats0212 closed 3 years ago

cats0212 commented 3 years ago

i have 8 gpu,but i run python3 -m tools.train_net --num-gpus 8 \ --config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_FPN_base1.yaml

ERROR [05/07 16:48:25 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 138, in train self.run_step() File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 232, in run_step loss_dict = self.model(data) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 705, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/github/few-shot-object-detection-master/fsdet/modeling/meta_arch/rcnn.py", line 115, in forward images, features, gt_instances File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 430, in forward gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 313, in label_and_sample_anchors gt_labels_i = self._subsample_labels(gt_labels_i) File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 258, in _subsample_labels label, self.batch_size_per_image, self.positive_fraction, 0 File "/home/anaconda3/envs/few/lib/python3.6/site-packages/detectron2/modeling/sampling.py", line 50, in subsample_labels perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg] RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal

Do you have any suggestions?thanks