Using the other supported neural network models on ColabNotebook and inference_image.py

dariuskrail commented 1 year ago

Hello there,

I was able to follow your example which you have posted on the ColabNotebook and have successfully able to perform deblurring using the DBN model on the test images locally on my PC via JupyterNotebook with CUDA enabled on PyTorch. So the example in ColabNotebook using DBN is working well.

Next, I tried to load a different mode (i.e. the DBLRNet) to compare the results, with the code snippet below.

...
model = build_backbone(model_cfg)
ckpt = torch.load("./demo/dblrnet_dvd.pth")
model_ckpt = ckpt["model"]
model_ckpt = {k[7:]: v for k, v in model_ckpt.items()}
model.load_state_dict(model_ckpt)
model = model.to(device)
...

I then get an error from the Python below.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_3448/3417428220.py in <module>
     19 model_ckpt = ckpt["model"]
     20 model_ckpt = {k[7:]: v for k, v in model_ckpt.items()}
---> 21 model.load_state_dict(model_ckpt)
     22 model = model.to(device)

~/anaconda3/envs/simdeblur/lib/python3.7/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
   1666         if len(error_msgs) > 0:
   1667             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1668                                self.__class__.__name__, "\n\t".join(error_msgs)))
   1669         return _IncompatibleKeys(missing_keys, unexpected_keys)
   1670 

RuntimeError: Error(s) in loading state_dict for DBN:
    Missing key(s) in state_dict: "F0.0.weight", "F0.0.bias", "F0.1.weight", "F0.1.bias", "F0.1.running_mean", "F0.1.running_var", "D1.0.weight", "D1.0.bias", "D1.1.weight", "D1.1.bias", "D1.1.running_mean", "D1.1.running_var", "F1_1.0.weight", "F1_1.0.bias", "F1_1.1.weight", "F1_1.1.bias", "F1_1.1.running_mean", "F1_1.1.running_var",
...

I have also attempted to run the inference_image.py script to do the same thing with following command in Linux. python inference_image.py ./configs/dblrnet/dblrnet_dvd.yaml ./demo/dblrnet_dvd.pth --img=./datasets/input/00000.jpg

This resulted in the following error below.

Using checkpoint loaded from ./demo/dblrnet_dvd.pth for testing.
Traceback (most recent call last):
  File "inference_image.py", line 81, in <module>
    inference()
  File "inference_image.py", line 70, in inference
    outputs = arch.postprocess(arch.model(arch.preprocess(input_image)))
  File "/home/emui/anaconda3/envs/simdeblur/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/emui/sandbox-git/SimDeblur/simdeblur/model/backbone/dblrnet/dblrnet.py", line 52, in forward
    l2 = self.L_in(x)
  File "/home/emui/anaconda3/envs/simdeblur/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/emui/anaconda3/envs/simdeblur/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/emui/anaconda3/envs/simdeblur/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/emui/anaconda3/envs/simdeblur/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 613, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/emui/anaconda3/envs/simdeblur/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 609, in _conv_forward
    input, weight, bias, self.stride, self.padding, self.dilation, self.groups
RuntimeError: Calculated padded input size per channel: (1 x 722 x 1282). Kernel size: (3 x 3 x 3). Kernel size can't be greater than actual input size

Do you know what could be causing the issue here? It would be really nice if you could provide step-by-step examples on how to use the scripts to deblur images/videos along with test inputs and expected output, so that we know we have properly setup the SimDeblur on our machines locally.

P.S. It would be useful to also add in the README.md on the instructions of how to install all the required dependencies for SimDeblur, e.g. the Python packages and the CUDA AI libraries and toolkit on Linux. Thanks again for the great work. :+1:

dariuskrail commented 1 year ago

Managed to get DBLRNet to work using the ColabNotebook example. The name in the config has to match that of the model used which can be found in./configs/dblrnet_dvd.yaml and using the model file dblrnet_dvd.pth.

import torch
from easydict import EasyDict as edict
from simdeblur.model import build_backbone

model_cfg = edict({
    "name": "DBLRNet",
    "num_frames": 5,
    "in_channels": 3,
    "inner_channels": 64
})

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Processing neural network with {device}!")

model = build_backbone(model_cfg)
ckpt = torch.load("./demo/dblrnet_dvd.pth")
model_ckpt = ckpt["model"]
model_ckpt = {k[7:]: v for k, v in model_ckpt.items()}
model.load_state_dict(model_ckpt)
model = model.to(device)

However, I still cannot get it to work for the other models, e.g. CDVD-TSP. I get the error message below. :(

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_3448/2778791916.py in <module>
     15 print(f"Processing neural network with {device}!")
     16 
---> 17 model = build_backbone(model_cfg)
     18 ckpt = torch.load("./demo/cdvd_tsp_dvd_paper.pth")
     19 model_ckpt = ckpt["model"]

~/sandbox-git/SimDeblur/simdeblur/model/build.py in build_backbone(cfg)
     24 
     25 def build_backbone(cfg):
---> 26     return build(cfg, BACKBONE_REGISTRY)
     27 
     28 

~/sandbox-git/SimDeblur/simdeblur/model/build.py in build(cfg, registry, args)
     19     args = copy.deepcopy(cfg)
     20     name = args.pop("name")
---> 21     ret = registry.get(name)(**args)
     22     return ret
     23 

TypeError: __init__() got an unexpected keyword argument 'num_frames'

ljzycmd commented 1 year ago

Hello, thank you for your interest in this project.

When you build a new model, you must indicate corresponding hyper-parameters in model_cfg (refer to the corresponding complete model in the config file ./configs/MODEL_NAME/):
```
...
model:
name: "DBLRNet"
num_frames: 5
in_channels: 3
inner_channels: 64
...
```
then your model_cfg should be:
```
...
model_cfg = edict({
"name": "DBLRNet",
"num_frames": 5,
"in_channels": 3,
"inner_channels": 64
})
...
```
Note that different models may adopt different input data formats, you may change the dataset config data_config correspondingly.
The script inference_image.py only supports single image deblurring but not video.
To install CUDA extensions of some models, your runtime (i.e. CUDA version, pytorch version, cupy version) must be complete and compatible. The compilation process is in the script Install.sh, you can refer to that for more details.

As for CDVD-TSP model, the error may come from the inaccurate model_cfg, you may change it into the following (refer to the corresponding config file):

model_cfg = edict({
name: "CDVD_TSP",
in_channels: 3,
n_sequence: 5,
out_channels: 3,
n_resblock: 3,
n_feat: 32,
load_flow_net: True,
load_recons_net: False,
flow_pretrain_fn: "",
recons_pretrain_fn: "",
is_mask_filter: True
)}

If you have more problems, please feel free to contact me.

dariuskrail commented 1 year ago

Thanks for the quick response.

Indeed the contents of the YAML config files for the supported models have different parameters. I'll adapt the parameters on the ColabNotebook example and try them again for the other models like the CDVD-TSP.

When I ran Install.sh, I got errors from missing headers as the script tries to compile some codes using NVCC. So I needed to install additional NVIDIA AI libraries like cudnn, cutensor, nccl and cupy along with the CUDA toolkit (v11.7 for me) to get the Install.sh to work. So I thought it might be useful to also indicate this in the README.md for first-timers setting up out SimDeblur. :+1:

I appreciate your contributions of putting together a generic framework to test different flavours of trained neural networks.It certainly helps SW developers like myself which do not have background in AI to be able to quickly use and try different models by just changing a line of code. :smile:

ljzycmd commented 1 year ago

Many thanks for your useful advice! I would indicate the installation process and add more descriptions in the README.md file. 😄

ljzycmd / SimDeblur

Using the other supported neural network models on ColabNotebook and inference_image.py #13