vb000 / NeuriCam

Deep learning based video sensing method for low-power IoT cameras (Smart glasses, GoPro, Blink etc.).
https://arxiv.org/abs/2207.12496
MIT License
90 stars 9 forks source link

Using a higher input resolution than 160x120 #6

Closed LinixLinux closed 1 year ago

LinixLinux commented 1 year ago

Hi,

The code provides great results but at times it feels like the quality of the upscale and colorization of finer details is limited by the small input resolution of 160x120. Evaluation using evalute.py only seems to allow for an input resolution of 160x120 for our low-resolution greyscale video. For example, using frames with a size of 320x240 returns 'RuntimeError: stack expects a non-empty TensorList'. I also encountered at one point 'runtimeerror: Sizes of tensors must match except in dimension 1' when attempting to use a frame size above 160x120. Is there any way to use a higher resolution frames for our lr-set video set folders?

vb000 commented 1 year ago

The evaluate.py is designed to work with dataset of an arbitrary resolution. Case in point, evaluation datasets used in the paper comprise of video sequences with multiple resolutions. Is it possible to provide more details about the error and your setup?

LinixLinux commented 1 year ago

Hi @vb000 ,

Thanks for your help. I think I understand why the error is occuring. The evaluate.py is expecting a lr video set which is 1/4 the size of the hr video set, i.e. lr video set is 160x120, hr video set is 640x480.

Therefore, my question is: Is it possible to turn off the upscaling feature of Neuricam and only use color propagation?

For example, I would like to use a 640x480 input video set and generate a 640x480 output.

I have tried editing params.json in /experiments/bix4_keyvsrc_attn and changing "upscale_factor" from 4 to 1. However the following error returns when trying to run evaluate.py


  File "evaluate.py", line 182, in <module>
    args.output_dir, args.file_fmt, args.profile)
  File "evaluate.py", line 81, in evaluate
    output_batch = model(train_batch)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/NeuriCam/model/net.py", line 27, in forward
    return self.model(model_in)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/NeuriCam/model/keyvsrc/net.py", line 151, in forward
    feats = self.propagate(lr_s, feats, key_frame_int)
  File "/content/NeuriCam/model/keyvsrc/net.py", line 96, in propagate
    feats = self.basicvsr_pp.propagate(feats, flows, module)
  File "/content/NeuriCam/model/basicvsr_pp/basicvsr_pp.py", line 240, in propagate
    flow_n1, flow_n2)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/NeuriCam/model/basicvsr_pp/basicvsr_pp.py", line 433, in forward
    out = self.conv_offset(extra_feat)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 196, 3, 3], expected input[1, 136, 484, 644] to have 196 channels, but got 136 channels instead```
vb000 commented 1 year ago

The model is designed for 4x upscaling, other scaling factors might need some tweaks. A workaround for your use case is, you could upsample the key-frame sequence to 2560 x 1920 resolution, or downsample the lr-set to 160 x 120 resolution, using bilinear or bicubic interpolation. That way, the lr-set would be 4x downsampled relative to the key-set, allowing you to use the existing model and weights. In the former case, you would have to downsample the NeuriCam-net's output to 640x480 resolution. And in the latter case, the output resolution would be 640 x 480.

A thing to note: since all you want to do is color propagation, you can only take the ab channels from the NeuriCam-net's output and concatenate them with the original L channel of the input to obtain the color video.