williamyang1991 / StyleGANEX

[ICCV 2023] StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
Other
484 stars 34 forks source link

Video editing output size #9

Open Inferencer opened 1 year ago

Inferencer commented 1 year ago

Hi fantastic job! I don't understand the output resolution in video editing, it looks like it tracks a single face and zooms in, what would be the best way to return to it's original size in a video editing app could I just do a zoom out x2 and a move x or y or something?

for my example the original size is 1920x1080 and the output is 1920x1632

Inferencer commented 1 year ago

Screenshot 2023-06-08 095630 I seem to have solved it, soon I will try cropping & alighning the faces myself then putting it through Ganex, as you can see there is a colour difference added, is that a result of the gan proccess or a side effect of a unoptimised video resolutionthat may be fixed by me doing cropping as mentioned earlier.

williamyang1991 commented 1 year ago

The color difference is due to the editing vector we used from previous papers is not well distentangled. Thus it will affect the color attributes.

You can use some face detector to detect the face region, and use alpha blending or poisson blending to alleviate the color difference.

Inferencer commented 1 year ago

ok i'd love to learn more about that please, as I have hired people to do a easy windows fork with audio pasteback, next step is pasting output video back into input video, but after that I was going to see if we could get this color shift fixed but it's sounding like that's not possible (i will mask as you suggest), but it's not something simple like passing the bgr image to the gan instead of the rgb, sorry if that line does not make sense I saw it in another repo talking about avoiding color shifts on gfpgan they did rgb to brg then passed it through then converted it back to rgb

Inferencer commented 1 year ago

I did a lot of research & played with the code and now understand what you mean, the masking in a third-party editor is giving me fantastic results so I cannot be happier with the color solution now, only thing that's bugging me is the lack of a paste back as currently I am eyeballing the resizing and x + y translation to re-allighn with the original footgage, i'll see if I can find a solution.

Inferencer commented 1 year ago

why can't a change the videoWriter = cv2.VideoWriter(save_name, fourcc, video_cap.get(5), (4*W, 4*H)) to videoWriter = cv2.VideoWriter(save_name, fourcc, video_cap.get(5), (W, H)) it seems any edit to a different multiple makes the output unplaybable including if I use a different encode such as fourcc = cv2.VideoWriter_fourcc(*'H264') '

I can resize before its written by adding more code just wonder why it's acting like this I figured those figures already came back from the model

williamyang1991 commented 1 year ago

your size should match your video frame, or the video cannot be saved. You cannot use a smaller size (W, H) unless you down sample the (4W, 4H) output frame to (W, H).

You can check the doc of cv2.VideoWriter to see the details or search on the stack overflow to find the solution. I'm not an expert of opencv.

Inferencer commented 1 year ago

your size should match your video frame, or the video cannot be saved. You cannot use a smaller size (W, H) unless you down sample the (4W, 4H) output frame to (W, H).

You can check the doc of cv2.VideoWriter to see the details or search on the stack overflow to find the solution. I'm not an expert of opencv.

that's ok I'm already coding it just trying to figure out the math, the scaling is working fine just can't seem to get the right calculations

if num == 1:
    save_image(y_hat[0].cpu(), save_name)
    print('Image editing successful!')
else:
    videoWriter.release()
    edited_video = VideoFileClip(save_name)
    edited_video = edited_video.set_audio(original_audio)  # Set the audio of the edited video

    # Get the original video resolution
    original_width = edited_video.size[0]
    original_height = edited_video.size[1]

    # Calculate the output resolution
    output_width = (original_width + 512) // 2
    output_height = (original_height + 512) // 2

    # Resize the video to the output resolution
    resized_video = edited_video.resize((output_width, output_height))

    # Save the resized video with the specified codec and audio codec
    resized_video.write_videofile(save_name + "_with_audio.mp4", codec="libx264", audio_codec="aac",
                                 bitrate="17054k", fps=edited_video.fps, threads=4)
    print('Video editing successful!')
williamyang1991 commented 1 year ago

Your code seems more professional than my original one for video configuration.

Inferencer commented 1 year ago

your size should match your video frame, or the video cannot be saved. You cannot use a smaller size (W, H) unless you down sample the (4W, 4H) output frame to (W, H).

So currently I am testing on 2 videos

The first Alpha.mp4 is 500x500 with a small face region, I downsample to (W,H) and divide by 2, this creates the correct size so I can overlay onto the original video.

The next video is Beta.mp4 which is 1920x1080 which a large face region, I downsample to (W,H) and multiply by 2.5 this creates the correct size so I can overlay onto the original video.

I then cropped Beta.mp4 to 800x800 and also downsampled to (W,H) and multiplyed by 2.5 which resulted in 800x800 output which was perfect so I knew I was on the right track it seems the size of the face must be to blame or the input being divisible by 64, whats your guess?