vision-workshop / Track2

13 stars 0 forks source link

train.py "RuntimeError: nms_impl: implementation for device cuda:0 not found" #5

Open cugwu opened 1 year ago

cugwu commented 1 year ago

Hi, I created the environment following the updated instruction with the correct versions of the libraries but now a new error raise. When running python Block2.py --datasets Console_sliced train.py gives the following error: RuntimeError: nms_impl: implementation for device cuda:0 not found.

I searched on the internet and it could be related to mmdet version, I also try to re-install the full environment without succeeding. Do you know how I can solve the problem? Maybe with another more updated version of mmdet that doesn't rise conflict with the code o the repo?

Thanks in advance for the answer, below I share the full lines of the errors. Traceback (most recent call last): File "tools/train.py", line 243, in <module> main() File "tools/train.py", line 232, in main train_detector( File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/apis/train.py", line 246, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run iter_runner(iter_loaders[i], **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 64, in train outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 248, in train_step losses = self(**data) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func return old_func(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/detectors/two_stage.py", line 135, in forward_train rpn_losses, proposal_list = self.rpn_head.forward_train( File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 339, in forward_train proposal_list = self.get_bboxes( File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 208, in new_func return old_func(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 102, in get_bboxes results = self._get_bboxes_single(cls_score_list, bbox_pred_list, File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/rpn_head.py", line 185, in _get_bboxes_single return self._bbox_post_process(mlvl_scores, mlvl_bbox_preds, File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/rpn_head.py", line 231, in _bbox_post_process dets, _ = batched_nms(proposals, scores, ids, cfg.nms) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 350, in batched_nms dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/utils/misc.py", line 340, in new_func output = old_func(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 175, in nms inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold, File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 28, in forward inds = ext_module.nms( RuntimeError: nms_impl: implementation for device cuda:0 not found.

this error caused the following:

Traceback (most recent call last): File "tools/test.py", line 276, in <module> main() File "tools/test.py", line 227, in main checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 638, in load_checkpoint checkpoint = _load_checkpoint(filename, map_location, logger) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 572, in _load_checkpoint return CheckpointLoader.load_checkpoint(filename, map_location, logger) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 314, in load_checkpoint return checkpoint_loader(filename, map_location) # type: ignore File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 333, in load_from_local raise FileNotFoundError(f'{filename} can not be found.') FileNotFoundError: work_dirs/Console_sliced/latest.pth can not be found.

vision-workshop commented 1 year ago

Hi,

What system are you using? Linux version etc?

Thanks, Best, The Vision Team

On Mon, Apr 17, 2023 at 8:07 AM Cynthia I. Ugwu @.***> wrote:

Hi, I created the environment following the updated instruction with the correct versions of the libraries but now a new error raise. When running python Block2.py --datasets Console_sliced train.py gives the following error: RuntimeError: nms_impl: implementation for device cuda:0 not found.

I searched on the internet and it could be related to mmdet version, I also try to re-install the full environment without succeeding. Do you know how I can solve the problem? Maybe with another more updated version of mmdet that doesn't rise conflict with the code o the repo?

Thanks in advance for the answer, below I share the full lines of the errors. Traceback (most recent call last): File "tools/train.py", line 243, in

main() File "tools/train.py", line 232, in main train_detector( File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/apis/train.py", line 246, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run iter_runner(iter_loaders[i], **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 64, in train outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 248, in train_step losses = self(**data) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func return old_func(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/detectors/two_stage.py", line 135, in forward_train rpn_losses, proposal_list = self.rpn_head.forward_train( File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 339, in forward_train proposal_list = self.get_bboxes( File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 208, in new_func return old_func(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 102, in get_bboxes results = self._get_bboxes_single(cls_score_list, bbox_pred_list, File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/rpn_head.py", line 185, in _get_bboxes_single return self._bbox_post_process(mlvl_scores, mlvl_bbox_preds, File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/rpn_head.py", line 231, in _bbox_post_process dets, _ = batched_nms(proposals, scores, ids, cfg.nms) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 350, in batched_nms dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/utils/misc.py", line 340, in new_func output = old_func(*args, **kwargs) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 175, in nms inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold, File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 28, in forward inds = ext_module.nms( RuntimeError: nms_impl: implementation for device cuda:0 not found. this error caused the following: Traceback (most recent call last): File "tools/test.py", line 276, in main() File "tools/test.py", line 227, in main checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 638, in load_checkpoint checkpoint = _load_checkpoint(filename, map_location, logger) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 572, in _load_checkpoint return CheckpointLoader.load_checkpoint(filename, map_location, logger) File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 314, in load_checkpoint return checkpoint_loader(filename, map_location) # type: ignore File "/home/clusterusers/cugwu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 333, in load_from_local raise FileNotFoundError(f'{filename} can not be found.') FileNotFoundError: work_dirs/Console_sliced/latest.pth can not be found. — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
cugwu commented 1 year ago

Hi, I'm using:

NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

and cuda version 11.8

vision-workshop commented 1 year ago

Hi,

In our Linux system, the new installation code works fine. Therefore, I cannot reproduce your error message. Could you please let us know if the following solution works (and replace the 1.5.0 to 1.7.1)? https://github.com/open-mmlab/mmdetection/issues/7788#issuecomment-1110168274

Thanks, Best, The Vision Team

On Thu, Apr 20, 2023 at 9:33 AM Cynthia I. Ugwu @.***> wrote:

Hi, I'm using:

NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL=" https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

and cuda version 11.8

— Reply to this email directly, view it on GitHub https://github.com/vision-workshop/Track2/issues/5#issuecomment-1516339980, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5OHNWELHDRKFXJWI6XUI4TXCE3L5ANCNFSM6AAAAAAXBC2SG4 . You are receiving this because you commented.Message ID: @.***>