swoook / ddrnet

Cloned from chenjun2hao/DDRNet (https://github.com/chenjun2hao/DDRNet.pytorch).
Other
1 stars 0 forks source link

RuntimeError: Error(s) in loading state_dict for DualResNet: ... #6

Closed swoook closed 3 years ago

swoook commented 3 years ago

Backgrounds

  1. 10 FPS on CPU
  2. 30 FPS on 1080Ti GPU

Issue description

RuntimeError: Error(s) in loading state_dict for DualResNet:

Code example

        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/site-packages/torch/distributed/launch.py", //"${file}",
            "console": "integratedTerminal",
            "args": [
                "--nproc_per_node", "1",
                "${workspaceRoot}/main.py",
                "--mode", "train",
                "--cfg_path", "./experiments/duts/ddrnet23_slim_poolnet_train_scheme.yaml",
                ]
        },
Traceback (most recent call last):
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 3293, in <module>
    main()
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 3286, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 2360, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 2367, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydev_imps/_pydev_execfile.py", line 25, in execfile
    exec(compile(contents + "\n", file, 'exec'), glob, loc)
  File "/data/swook/repos/chenjun2hao/ddrnet/main.py", line 115, in <module>
    main(args)
  File "/data/swook/repos/chenjun2hao/ddrnet/main.py", line 98, in main
    solver = Solver(train_loader, None, config, args)
  File "/data/swook/repos/chenjun2hao/ddrnet/sod/solver.py", line 26, in __init__
    self.build_model()
  File "/data/swook/repos/chenjun2hao/ddrnet/sod/solver.py", line 74, in build_model
    self.net.load_state_dict(model_dict, strict=False)
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DualResNet:
        size mismatch for seghead_extra.conv2.weight: copying a param with shape torch.Size([19, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 64, 1, 1]).
        size mismatch for seghead_extra.conv2.bias: copying a param with shape torch.Size([19]) from checkpoint, the shape in current model is torch.Size([1]).
        size mismatch for final_layer.conv2.weight: copying a param with shape torch.Size([19, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 64, 1, 1]).
        size mismatch for final_layer.conv2.bias: copying a param with shape torch.Size([19]) from checkpoint, the shape in current model is torch.Size([1]).
Killing subprocess 53003
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.6.944021595/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
    main()
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/data/swook/miniconda3/envs/torch18csnet/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/data/swook/miniconda3/envs/torch18csnet/bin/python', '-u', '/data/swook/repos/chenjun2hao/ddrnet/main.py', '--local_rank=0', '--mode', 'train', '--cfg_path', './experiments/duts/ddrnet23_slim_poolnet_train_scheme.yaml']' returned non-zero exit status 1.

System Info

PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.3 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.5.1

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti

Nvidia driver version: 440.33.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.8.1
[pip3] torchvision==0.9.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.2.89              hfd86e86_1  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.2.0           h06a4308_296  
[conda] mkl-service               2.3.0            py38h27cfd23_1  
[conda] mkl_fft                   1.3.0            py38h42c9631_2  
[conda] mkl_random                1.2.1            py38ha9443f7_2  
[conda] numpy                     1.20.2           py38h2d18471_0  
[conda] numpy-base                1.20.2           py38hfae3a4d_0  
[conda] pytorch                   1.8.1           py3.8_cuda10.2_cudnn7.6.5_0    pytorch
[conda] torchvision               0.9.1                py38_cu102    pytorch
swoook commented 3 years ago
  1. Input image with size of (3, H, W)
  2. H is height
  3. W is width
  4. cls is a number of classes
  5. N is batch-size
class DualResNet(nn.Module):

    def __init__(self, block, layers, num_classes=19, planes=64, spp_planes=128, head_planes=128, augment=True):
seghead_extra.conv2.weight with shape torch.Size([19, 64, 1, 1])
seghead_extra.conv2.bias with shape torch.Size([19])
final_layer.conv2.weight with shape torch.Size([19, 64, 1, 1])
final_layer.conv2.bias with shape torch.Size([19])
seghead_extra.conv2.weight with shape torch.Size([1, 64, 1, 1])
seghead_extra.conv2.bias with shape torch.Size([1])
final_layer.conv2.weight with shape torch.Size([1, 64, 1, 1])
final_layer.conv2.bias with shape torch.Size([1])