Inference is not as accurate as the provided video samples

ewwnage commented 2 years ago

Hi, I'd like to evaluate the network on the 29 Huang annotations (resized to 432x768). I therefore infer every image sequence and calculate the SSIM and PSNR for every frame. After averaging all frames of all sequences I end up with an SSIM of 0.924 and PSNR 23.5. I evaluate on the following settings:

       --mode object_removal \
       --seamless \
       --Nonlocal \
       --edge_guide \
       --path ../data/Davis/Huang_annotations/rgb_png \
       --path_mask ../data/Davis/Huang_annotations/mask_png \

Regardless of the metrics you can tell that the inference result is visibly not as good as the videos you provided on the project page. What should an inference script look like to achieve the performance you inferred the network at?

Thanks in advance

gaochen315 commented 2 years ago

Hi, can you first try the object removal on the provided tennis sequence? Does the result look good?

python video_completion.py \
       --mode object_removal \
       --path ../data/tennis \
       --path_mask ../data/tennis_mask \
       --outroot ../result/tennis_removal \
       --seamless

ewwnage commented 2 years ago

Okay it turns out I ran the inference with --NonLocal and --edgeconnect, which yielded different results. In which scenario would you recommend to use these flags? It seems like the script would work perfectly fine without them (especially the fine tuned EdgeConnect weights)?

ewwnage commented 2 years ago

Found answers to my questions in the paper.

cyrala commented 1 year ago

29 Huang annotations

Hi! I cannot find 29 Huang annotationsn in Huang's paper. Can you tell me where you found it? Or can you share the link to me? Thank you very much!

vt-vl-lab / FGVC

Inference is not as accurate as the provided video samples #66