open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.21k stars 9.4k forks source link

fps benchmark tool can't work #11717

Open KRL666 opened 4 months ago

KRL666 commented 4 months ago

Hello everyone,

Firstly, I apologize for my English.

I have a problem: I trained a custom object detector, and for my use case, the Frames Per Second (FPS) is crucial. I used the following command: python -m torch.distributed.launch --nproc_per_node=1 --master_port=29500 tools/analysis_tools/benchmark.py work_dirs/ca-o/ca-o.py work_dirs/ca-o/best_bbox_mAP_epoch_183.pth --launcher pytorch Unfortunately, the command did not run properly. Instead, an error occurred, and I am unable to fix it. Maybe someone knows where the error is coming from.

/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( loading annotations into memory... Done (t=0.00s) creating index... index created! load checkpoint from local path: work_dirs/ca-o/best_bbox_mAP_epoch_183.pth Traceback (most recent call last): File "tools/analysis_tools/benchmark.py", line 188, in main() File "tools/analysis_tools/benchmark.py", line 182, in main repeat_measure_inference_speed(cfg, args.checkpoint, args.max_iter, File "tools/analysis_tools/benchmark.py", line 151, in repeat_measure_inference_speed measure_inference_speed(cp_cfg, checkpoint, max_iter, log_interval, File "tools/analysis_tools/benchmark.py", line 111, in measure_inference_speed model(return_loss=False, rescale=True, data) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward output = self._run_ddp_forward(inputs, kwargs) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward return module_to_run(*inputs[0], kwargs[0]) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 139, in new_func output = old_func(new_args, new_kwargs) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmdet-2.22.0-py3.8.egg/mmdet/models/detectors/base.py", line 174, in forward return self.forward_test(img, img_metas, **kwargs) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmdet-2.22.0-py3.8.egg/mmdet/models/detectors/base.py", line 137, in forward_test img_meta[img_id]['batch_input_shape'] = tuple(img.size()[-2:]) TypeError: 'DataContainer' object is not subscriptable ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 920690) of binary: /home/cuicui/anaconda3/envs/mmdet3/bin/python Traceback (most recent call last): File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in main() File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

tools/analysis_tools/benchmark.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-05-15_18:41:58 host : cuicui rank : 0 (local_rank: 0) exitcode : 1 (pid: 920690) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html** ============================================================
Linengyao commented 4 months ago

i meet the same problem. do u solve it?