mcahny / Deep-Video-Inpainting

Official pytorch implementation for "Deep Video Inpainting" (CVPR 2019)
512 stars 95 forks source link

questions on double_size and some blurred results #17

Closed zengyh1900 closed 3 years ago

zengyh1900 commented 5 years ago

Dear authors,

I'm studying your paper and codes, thanks for sharing! I have a question that, as I know from your codes and paper, 0 indicates non-hole pixels and 1 indicates hole pixels. but why do you need to multiply masks by 0.5 when the input size is 512? demo_vi.py

Also, I have observed similar results of some cases shown in your paper, however, some cases are very blurred (especially in slow-moving cases) This is an example from DAVIS (DAVIS/bear) image Is it reasonable? Or did I miss anything to get the results better?

Looking forward to your reply.

ytongW commented 4 years ago

Could you tell me how to change the size of output image? when I changed the size of input image directly, I got this error. RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 32 and 64 in dimension 3 at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCTensorMath.cu:111 if I resize the output image, then the image will get very blurred. Thanks for your time!

AjithPanja commented 3 years ago

Yeah, I too noticed the blurry part while running the code with Bear video. I would be really grateful if you could clarify my doubt 😅. From my understanding, the known pixels from the previous and future frames are filled, but how blind spot pixels are filled? (Eg. A trashcan in the same place throughout the video, If the trashcan has to be removed how it's pixels will be filled?)

mcahny commented 3 years ago

Hi all, thanks for your interest. To answer your questions,

The double_size case was trained with the mask where the hole region is filled with the value 0.5, and non-hole regions with 1.0. There is no special reason behind this choice.

About the fixed-size hole, your results look reasonable and I can reproduce that on the bear video. My understanding on this result are based on these points: