tarun005 / FLAVR

Code for FLAVR: A fast and efficient frame interpolation technique.
Apache License 2.0
441 stars 69 forks source link

Vimeo90K triplet test dataset performance issue #1

Closed JunHeum closed 3 years ago

JunHeum commented 3 years ago

Hi,

I am impressed with your new video frame interpolation paper.

When I tested, I got 32.59dB in vimeo90K triplet test set.

According to your Middleburry.py in dataset directory, I fixed VimeoSepTuplet class to VimeoTriplet class like below.

What is the problem in my fixed code?

I am wondering if I could get custom triplet interpolation code which takes two input frames and yields an intermediate frame.

    class VimeoTriplet(Dataset):
        def __init__(self, data_root):
            self.data_root = data_root
            self.image_root = os.path.join(self.data_root, 'sequences')

            test_fn = os.path.join(self.data_root, 'tri_testlist.txt')

            with open(test_fn, 'r') as txt:
                self.seq_list = [line.strip() for line in txt]

        def __getitem__(self, index):
            im1 = Image.open('%s/%s/im1.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            gt = Image.open('%s/%s/im2.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            im3 = Image.open('%s/%s/im3.png'%(self.image_root,self.seq_list[index])).convert('RGB')

            im1, gt, im3 = map(to_tensor, (im1,gt,im3))

            return [im1, im1, im3, im3], [gt]

        def __len__(self):
            return len(self.seq_list)
tarun005 commented 3 years ago

Your code looks alright. FLAVR is designed (and trained) to take multiple frames from video as input and render the output video with higher frame rate. As such, duplicating input frames is suboptimal as it does not reflect the true motion characteristics in the video, hence the performance with duplicating triplets will be low.

JunHeum commented 3 years ago

I understand why the suboptimal results appear when duplicating input frames.

Since FLAVR takes 4 input frames, its performance seems to get a large gain compared to other methods.

Isn't it unfair compared to other video interpolation methods that use 2 frames as input?

tarun005 commented 3 years ago

In the paper, we also show comparisons with QVI which takes 4 input frames.

hzwer commented 3 years ago

Multi-frame interpolation algorithms (SuperSlomo, QVI, EQVI, FLAVR) and single-frame interpolation algorithms (DAIN, CAIN, SoftSplat) are usually compared on different benchmarks. FLAVR has made some efforts to unify these two types of algorithms.

tarun005 commented 3 years ago

@hzwer Yes, that's is right. we indeed make an attempt to unify benchmarks across all the prior works which work in very different settings.