lengstrom / fast-style-transfer

TensorFlow CNN for fast style transfer ⚡🖥🎨🖼
10.91k stars 2.6k forks source link

frame malposition using transformed video #264

Open lujingqiao opened 3 years ago

lujingqiao commented 3 years ago

when i transforming video, the result video is frame malposition,see below two pic

image image

is there other partner get it?

hude-as commented 2 years ago

Heya !

I had the same issue and address it with a quick fix. I dont have enough background in NN to fix it properly.

TLDR; make sure the output of transform method has the same dimensions as your output video.

Context and debug

The issue come from the shape reduction and augmentation in the src/transform.py file that lead to a different format.

For example with a video of 640x338 you will output frames of 640x340, if you debug the network layer by layer conv1 shape = (4, 338, 640, 32) conv2 shape = (4, 169, 320, 64) conv3 shape = (4, 85, 160, 128) resid1 shape = (4, 85, 160, 128) resid2 shape = (4, 85, 160, 128) resid3 shape = (4, 85, 160, 128) resid4 shape = (4, 85, 160, 128) resid5 shape = (4, 85, 160, 128) conv_t1 shape = (4, 170, 320, 64) conv_t2 shape = (4, 340, 640, 32)

You can see the shape of each layer and notice the difference between the first and the last. (640x338 vs 640x340) 338 --> 169 --> 85 (instead of 84.5) --> 170 --> 340

In the evaluate.py file, when you write the video, the output file is written with frames of same size as the original clip, nethertheless the NN will output 640x340 images and this will slightly move your video frame by frame (in my example, 2 row at a time)

Quick fix : Compute the video output size based on the preds shape instead of original clip

def ffwd_video(path_in, path_out, checkpoint_dir, device_t='/gpu:0', batch_size=1):
        [...]
        preds_size = [preds.shape[2], preds.shape[1]]
        video_writer = ffmpeg_writer.FFMPEG_VideoWriter(path_out, preds_size, video_clip.fps, codec="libx264",
                                                        preset="slow", bitrate="20000k",
                                                        audiofile=path_in, threads=None,
                                                        ffmpeg_params=None)
        [...]

A probably better way to deal with this is to keep the same output format as the input in the transform.net function.

Hope it will help in future