Closed jorenvs closed 4 years ago
REDS trained EDVR needs to be run through two stages, so you could try passing the output from stage 1 through the stage 2 model. Other than that I believe there is a flip_test mode, which helps improve the quality.
Yeah I figured out it has two stages afterwards, I'll rerun the experiment again soon. The flip_test mode looks interesting, I guess an ensemble should increase generalisation a little.
I can report a similar issue, I have tried using both stages, using flip_test, tested on a variety of videos. The model does not perform to the level it does on REDS4. It has multiple artifacts, and the quality overall is blurry.
Okay, I solved my issue. The problem was the downsampling method. The datasets, that the model was trained on, were created by downsampling with MATLAB's imresize
function. So if you generate input data with anything else (opencv, ffmpeg) - it doesn't work. You have to use MATLAB's imresize
, or it's python's equivalent, which is implemented in this repo here.
Hmm, that's what I feared. That kind of defeats the purpose of super resolution. I don't want to downsample my data, I want to upsample it :).
@adamsvystun Could you mention the exact flow that you used with the function you mentioned. did you basically send your input video through that method say if it H x W -> Target Resolution and subsequently pass it through EDVR to get the output. There were some weird blue-green artifacts during fast motion in my output, so I'm curious.
@jorenvs It should work with upsampling. In my case I had a video in 720p and wanted to test 180p->720p upsampling, that's why I had to downsample. And it turns out that the model is very sensitive to the way you are doing this. If you have a video in low res only, it should just work.
@SreeHarshaNelaturu Yeah, for testing, I first downsample, then upsample with the model, and compare the results. Not sure about blue-green artifacts, I did not have any.
well, my videos are 1344x1344, so not really low quality. That's all relative to the angle of the lens of course, these are generated from 360° 5.6k gopro videos. The goal is to be able read far away text on traffic signs and such.
Thank you for the prompt response @adamsvystun I was wondering about the part you'd mentioned about not using FFmpeg or cv2 to generate input data. What did you use to extract frames from the video to SR in your case to using those methods.
@SreeHarshaNelaturu I said don't use FFmpeg or cv2 for downscaling (resizing down). For frame extraction you can use anything you want.
Gotcha, I think the blue-green error is a consequence of something else. And yep, I was resizing via FFmpeg, might help to resize after extraction.
Thank you!
I'm not sure but I think downscaling matlab's method could differ from FFmpeg and cv2 In my case, EDVR works well with the bicubic downscaling method but it has artifact like this with others (ex: low-res videos from youtube). I guess that EDVR which trained with REDS dataset is overfitted about the reconstruction of bicubic downscaling as REDS dataset is consist of the bicubic downscaled dataset
Yes, the current CNN-based methods does generalize to other datasets with different downsampling kernels.
There is another research filed called blind SR to solve this issue.
I tried applying the video super resolution (EDVR) on other data, but I'm getting very weak results. The output barely seems to differ from the input in quality. Examples below (left is output, right is zoomed in input).
I tried both the EDVR_REDS_SR_L and the EDVR_Vimeo90K_SR_L models, with varying input sizes, getting similar results. Is this to be expected? I would guess given the REDS4 dataset was also mostly street scenes, it should at least perform similarly.
The code I'm using (adapted from test_Vid4_REDS4_with_GT.py and moved to the root folder of the repos. Although I tested it on the REDS4 dataset with no issues.