open-mmlab / mmrazor

OpenMMLab Model Compression Toolbox and Benchmark.
https://mmrazor.readthedocs.io/en/latest/
Apache License 2.0
1.45k stars 228 forks source link

[Bug] get error when using get_channel_units.py for faster_rcnn #431

Closed cxiang26 closed 1 year ago

cxiang26 commented 1 year ago

Describe the bug

I fail to get the config template of target_pruning_ratio using dev-1.x branch

Post related information

  1. mmrazor version: 1.0.0.rc2
  2. without modified
python ./tools/pruning/get_channel_units.py configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py --choice
  1. results
    
    (base) python ./tools/pruning/get_channel_units.py configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py --choice
    /opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /opt/conda/conda-bld/pytorch_1634272068694/work/c10/cuda/CUDAFunctions.cpp:112.)
    return torch._C._cuda_getDeviceCount() > 0
    No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
    Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
    File "/root/workspace/projects/mmrazor/mmrazor/models/algorithms/pruning/dcff.py", line 64, in __init__
    super().__init__(architecture, mutator_cfg, fix_subnet,
    File "/root/workspace/projects/mmrazor/mmrazor/models/algorithms/pruning/ite_prune_algorithm.py", line 141, in __init__
    self.mutator.prepare_from_supernet(self.architecture)
    File "/root/workspace/projects/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.py", line 114, in prepare_from_supernet
    units = self._prepare_from_tracer(supernet, self.parse_cfg)
    File "/root/workspace/projects/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.py", line 327, in _prepare_from_tracer
    unit_configs = tracer.analyze(model)
    File "/root/workspace/projects/mmrazor/mmrazor/models/task_modules/tracer/channel_analyzer.py", line 102, in analyze
    path_list = self.tracer.trace(model)
    File "/root/workspace/projects/mmrazor/mmrazor/models/task_modules/tracer/backward_tracer.py", line 187, in trace
    pseudo_loss = self.loss_calculator(model)
    File "/root/workspace/projects/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/sum_loss_calculator.py", line 23, in __call__
    pseudo_output = model(pseudo_img)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
    File "/root/workspace/projects/mmdetection/mmdet/models/detectors/base.py", line 96, in forward
    return self._forward(inputs, data_samples)
    File "/root/workspace/projects/mmdetection/mmdet/models/detectors/two_stage.py", line 131, in _forward
    rpn_results_list = self.rpn_head.predict(
    File "/root/workspace/projects/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 191, in predict
    batch_img_metas = [
    TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./tools/pruning/get_channel_units.py", line 84, in main() File "./tools/pruning/get_channel_units.py", line 48, in main model = MODELS.build(config['model']) File "/opt/conda/lib/python3.8/site-packages/mmengine/registry/registry.py", line 454, in build return self.build_func(cfg, *args, **kwargs, registry=self) File "/opt/conda/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 240, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/opt/conda/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg raise type(e)( TypeError: class DCFF in mmrazor/models/algorithms/pruning/dcff.py: 'NoneType' object is not iterable

LKJacky commented 1 year ago

We are sorry. It's indeed a bug. Here is a temporary solution: please change BackwardTracer to FxTracer in the following position.

https://github.com/open-mmlab/mmrazor/blob/67da3ad240e2afe6dfcabd820d8413743471d67d/configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py#L79

cxiang26 commented 1 year ago

thanks, its works. but I got another error when run

python ./tools/pruning/get_channel_units.py configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py -i -c --output-path configs/pruning/mmdet/dcff/det.json
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/json/__init__.py", line 234, in dumps
    return cls(
  File "/opt/conda/lib/python3.8/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/opt/conda/lib/python3.8/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/opt/conda/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/opt/conda/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/opt/conda/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  [Previous line repeated 2 more times]
  File "/opt/conda/lib/python3.8/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/opt/conda/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/opt/conda/lib/python3.8/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/opt/conda/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type builtin_function_or_method is not JSON serializable
LKJacky commented 1 year ago

We fix these two bugs in this pr. You can cherry-pick it temporarily before we merge it.

Jaykob commented 1 year ago

UPDATE: I opened a new bug as this would probably get lost here. New bug report: https://github.com/open-mmlab/mmrazor/issues/464

Hi! I'm currently trying to prune a resnet50 using the configs/pruning/mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py config and I get the following error, even after applying the mentioned fixes here. Could this be related?

Traceback (most recent call last):
  File "/anaconda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/some_path/mmrazor/mmrazor/models/algorithms/pruning/dcff.py", line 62, in __init__
    super().__init__(architecture, mutator_cfg, data_preprocessor,
  File "/some_path/mmrazor/mmrazor/models/algorithms/pruning/ite_prune_algorithm.py", line 137, in __init__
    self.mutator.prepare_from_supernet(self.architecture)
  File "/some_path/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.py", line 105, in prepare_from_supernet
    units = self._prepare_from_tracer(supernet, self.parse_cfg)
  File "/some_path/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.py", line 301, in _prepare_from_tracer
    unit_configs = tracer.analyze(model)
  File "/some_path/mmrazor/mmrazor/models/task_modules/tracer/channel_analyzer.py", line 106, in analyze
    fx_graph = self._fx_trace(model)
  File "/some_path/mmrazor/mmrazor/models/task_modules/tracer/channel_analyzer.py", line 131, in _fx_trace
    args = self.demo_input.get_data(model)
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/demo_inputs.py", line 29, in get_data
    return self._get_data(model, input_shape, training)
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/default_demo_inputs.py", line 105, in _get_data
    return defaul_demo_inputs(model, input_shape, training, self.scope)
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/default_demo_inputs.py", line 79, in defaul_demo_inputs
    return demo_input().get_data(model, input_shape, training)
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/demo_inputs.py", line 29, in get_data
    return self._get_data(model, input_shape, training)
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/demo_inputs.py", line 49, in _get_data
    data = self._get_mm_data(model, input_shape, training)
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/demo_inputs.py", line 139, in _get_mm_data
    data = demo_mmpose_inputs(model, input_shape)
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/mmpose_demo_input.py", line 35, in demo_mmpose_inputs
    batch_data_samples = [
  File "/some_path/mmrazor/mmrazor/models/task_modules/demo_inputs/mmpose_demo_input.py", line 36, in <listcomp>
    inputs['data_sample'] for inputs in get_packed_inputs(
TypeError: string indices must be integers

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train.py", line 121, in <module>
    main()
  File "tools/train.py", line 114, in main
    runner = Runner.from_cfg(cfg)
  File "/anaconda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 431, in from_cfg
    runner = cls(
  File "/anaconda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 398, in __init__
    self.model = self.build_model(model)
  File "/anaconda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 800, in build_model
    model = MODELS.build(model)
  File "/anaconda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 521, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/anaconda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 240, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/anaconda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg
    raise type(e)(
TypeError: class `DCFF` in mmrazor/models/algorithms/pruning/dcff.py: string indices must be integers