Evaluation Metrics - Githubissues

ewwnage commented 2 years ago

Hey,

I ran the inference on the 29 Huang annotated sequences from DAVIS 2017.

srun python video_completion.py \
       --mode object_removal \
       --seamless \
       --path ../data/Davis/Huang_annotations/rgb_png \
       --path_mask ../data/Davis/Huang_annotations/mask_png \

the results visibly match the videos on your project page. Anyhow I cannot come up with an evaluation method that matches your results. In the case of the object removal task I mixed up color sequences with other mask sequences from the set (e.g hiking_frames <-> flamingo_masks[cropped to matching length]). Inferencing all sequences does not result in an SSIM nor the PSNR stated in table 1 of the paper. From the visible results on the Huang annotions I'd expect a SSIM of 0.99 but since we cannot calculate any ground truth related metrics on this set I need your advice.

What are the evaluations pairs for table1?

cyrala commented 1 year ago

Hey,

I ran the inference on the 29 Huang annotated sequences from DAVIS 2017.
srun python video_completion.py \
       --mode object_removal \
       --seamless \
       --path ../data/Davis/Huang_annotations/rgb_png \
       --path_mask ../data/Davis/Huang_annotations/mask_png \
the results visibly match the videos on your project page. Anyhow I cannot come up with an evaluation method that matches your results. In the case of the object removal task I mixed up color sequences with other mask sequences from the set (e.g hiking_frames <-> flamingo_masks[cropped to matching length]). Inferencing all sequences does not result in an SSIM nor the PSNR stated in table 1 of the paper. From the visible results on the Huang annotions I'd expect a SSIM of 0.99 but since we cannot calculate any ground truth related metrics on this set I need your advice.

What are the evaluations pairs for table1?

Hi! I also encountered the same problem as you, Have you successfully reproduced result that similar to table 1?

ewwnage commented 1 year ago

I have not been able to reproduce the table 1 but I think I got to the top of it. I assume that the authors had a very lucky "random" pairing of mask and video sequences. E.g the slowly moving bear mask on the surf video yields very good results since the algorithm performs great on the water surface's texture. Other publications also struggle to reproduce the claimed results.

vt-vl-lab / FGVC

Evaluation Metrics #67