neuralchen / SimSwap

An arbitrary face-swapping framework on images and videos with one single trained model!
Other
4.44k stars 875 forks source link

Are there parameters for making the source facial details stronger? #14

Open ExponentialML opened 3 years ago

ExponentialML commented 3 years ago

First of all, great work. Are there parameters that can keep more of the details from the source image, or is this something that needs to be trained on? For example, sometimes there might be key details missing from the eyes, or maybe there are other features(piercings, tattoos, moles, and so on for example) that I may want to keep. Thanks!

NNNNAI commented 3 years ago

Thank you for carefully studying our project engineering. We do not have parameters to adjust the degree of facial feature retention, because this project is our open source code for ACMMM2020. From a research perspective, our goal at the time was that features such as tattoos and piercings should be removed, because this is not The nature of the character. Of course we will consider your suggestions, and maybe we will add an interface to control the degree of feature retention in the version 2 that we will release in the future. This should be cool, many thanks~

NNNNAI commented 3 years ago

If you have any other suggestions, please feel free to ask questions, and we will consider your comments and make improvements. :)

ExponentialML commented 3 years ago

Thanks for your reply! I'm honestly impressed that you can throw a full video into the script without any pre-production editing, so great work. I have some suggestions that should be easy to implement.

  1. The bounding box is very visible after inference. Filling the face based on facial landmarks, then using that as a blurred matte should solve that issue.

  2. An option to perform super resolution on the face during or after inference. An example would be something like Face Sparnet.

  3. Support for the side of faces, or more extreme angles. I'm sure this is due to how the model was trained and not the actual project itself.

All of these things I can do myself, but I have to use other projects to implement them. It would be nice to have them native to what the script uses like ONNX and insightface.

NNNNAI commented 3 years ago

Cool, I will try to add these modules to see if there is any improvement.

ExponentialML commented 3 years ago

Cool, I will try to add these modules to see if there is any improvement.

Great. Another idea for a problem I've just realized. If the face is tilted too far, there are a lot of jitters on the swapped face. For example, if the head goes from a 0 degree angle to a 90 degree one, the facial alignment struggles a bit. To fix this, you can set a parameter by getting the rotation matrix from the top of the head to the bottom. It would be something like this in steps.

  1. Detect if head is tilted to far.
  2. If the head tilted to far, rotate then entire target video so that it is aligned, and the head's the chin at the bottom, and the top of the head at the top.
  3. Re-detect the face after the rotation, and swap.
  4. Rotate the entire target video back to its regular rotation.
  5. Continue inference and repeat if needed.

That should solve that issue. If this was Dlib I could probably implement this, but I'm not familiar with what you're using :).

NNNNAI commented 3 years ago

I see, it sounds very reasonable, let me spend some time to try this idea these two days.

woctezuma commented 3 years ago
  1. An option to perform super resolution on the face during or after inference. An example would be something like Face Sparnet.

For info, I use an optional post-processing with GPEN [1] in one of my SimSwap-Colab notebooks: gpen/SimSwap_images.ipynbOpen In Colab

If you want to try GPEN on Colab, independently from SimSwap, I have a small Github Gist for that purpose: https://gist.github.com/woctezuma/ecfc9849fac9f8d99edd7c1d6d03758a

Based on my experiments with super-resolution, GPEN looks the best among open-source super-resolution algorithms for faces. When it works, it is great. However, one has to be careful, because it can introduce artifacts, especially if the input photos had already been processed with a super-resolution algorithm. So it really has to be optional, to avoid inadvertently stacking super-resolution processing steps, which would degrade the final result.

[1] Yang, Tao, et al. GAN Prior Embedded Network for Blind Face Restoration in the Wild. arXiv preprint arXiv:2105.06070. 2021. (code and paper)

ExponentialML commented 3 years ago

@woctezuma Wow, that works really well. Thanks for the tip!

@NNNNAI I actually tried implementing the video rotation idea, but that didn't solve the issue. The performance comes down to the function here. The alignment algorithm seems to make the face crop shake a lot, resulting in poor alignment on some more extreme angles.

kkhaial commented 3 years ago

Would also be good to have the mask saved as a separate output to bring the footage into after effects to edit

AlonDan commented 3 years ago

After playing with GPEN and DFNet (GPEN gives much better quality results in my opinion based on some tests).

Like the suggested above, but I would like to focus on local machine (Windows 10 + Anaconda) It will be really nice and useful to have optional extra command to add: GPEN as post:

1️⃣ - GPEN should run on every frame that SimSwap generated. 2️⃣ - Then SimSwap merge-connect everything together hopefully with lossless quality from the source video.

I couldn't install GPEN locally so I could only test it via google colab, but if this could be merged to SimSwap as optional command so the user won't have to use GPEN (like the current version) but using a command to run GPEN as post with it's basic properties (scaling, generate ONLY the final GPEN without the other files to speed up things and focus SimSwap's goal) will be more than nice!

instant-high commented 3 years ago

@ftaker887 you say that you can solve the problem with the bounding box after inference. Would you share your solution/ source code for that?

NNNNAI commented 3 years ago

You can try to use this face parsing repo to get the mask, and blend the original image and swapping image according to the mask. The results by using mask or by bounding box are shown below. Many thansk~. 1 2

AlonDan commented 3 years ago

Impressive! the smooth masking bounding box looking really nice! @NNNNAI Any chance this feature will be added to SimSwap as extra command?

--MaskSmooth or if needed extra parameters or values?

NNNNAI commented 3 years ago

I will release this feature may be within a week, cause there are works I got to busy with recently. Many thanks~.

AlonDan commented 3 years ago

I will release this feature may be within a week, cause there are works I got to busy with recently. Many thanks~.

That will be great! thank you 👍

instant-high commented 3 years ago

Thanks. I think I will wait until this feature is released.

Btw. I've added two extra parameters to set start and endframe (cut_in/cut_out) for the inference to be set from the commandline if one wants to process only parts of the input video. Video works fine so far, but I have problems to set the audio cut_in....

NNNNAI commented 3 years ago

@instant-high You can check ffmpeg and moviepy, both third party libraries support audio injection.

instant-high commented 3 years ago

@NNNNAI . Yes I know... But never did that in python.... Most of the time I use VisualBasic for my projects. (Some videotools, GUI for first order motion and motion co-segmentation,, wav2lip...)

NNNNAI commented 3 years ago

@instant-high Hhhhh, I see, hope U can get it done~.

ExponentialML commented 3 years ago

You can try to use this face parsing repo to get the mask, and blend the original image and swapping image according to the mask. The results by using mask or by bounding box are shown below. Many thansk~.

I've actually started training a model a few days ago using Self Correction for Human Parsing. It's very fast, and I believe it works without adding too many modules to SimSwap's repository.

The dataset I'm using is one that I've found here called LaPa, and would probably be best as it deals with occlusion, faces, and difficult situations.

img

The idea is that you would either mask it first for better alignment, or use it as a post processing step. There's another idea of using face inpainting after masking, but that's starting to get into third party plugin territory which might add complexity.

Would you like me to create a pull request when or if I get this implemented?

AlonDan commented 3 years ago

I don't know how to train my own model or anything complicated like that, I currently messing around with SimSwap built-in models and I see some issues I would love to see solve in future SimSwap, sometimes it can't handle some parts unlike the example above, and many times it is flickering in a way it won't recognize some frames that are similar to one before or after which is weird, also sometimes it is resizing the face because of not accurate face recognize I guess.


@ftaker887 I must mention that I'm VERY impressed from how accurate the mask is on the example you just posted! :o I see the issues I currently have with "Face behind shoulders" sometimes with Hands over face when someone for example fix their hair. and with THIS example you just showed... WOW!!! that looks very accurate with so much details related to how it mask it.

@NNNNAI is there a chance this will be used for making in SimSwap future version? If this is possible, I imagine how accurate the issues me and others may already tested, I'm no programmer and have no idea how to do such thing but I hope anyone here could merge with that way of masking. 🙏

NNNNAI commented 3 years ago

@ftaker887 Wonderful~!!!!!!!!!!!!!!! But it will indeed add complexity to the original simswap repo by introducing self training mask model. How about this you create a individual repo named "simswap with more accuary mask" or something like that, and I will add a link refer to your repo in the simswap homepage when you get the function implemented.It all depends on you, looking forward to your feedback.

instant-high commented 3 years ago

Sorry, but just another stupid question: Messing around with ./util/reverse2original.py I found out that in line 11

swaped_img = swaped_img.cpu().detach().numpy().transpose((1, 2, 0))

is the "swapped" result image

As an absolute python beginner I've managed to blur / resize this swaped image by adding this code:

for swaped_img, mat in zip(swaped_imgs, mats): swaped_img = swaped_img.cpu().detach().numpy().transpose((1, 2, 0)) img_white = np.full((crop_size,crop_size), 255, dtype=float)

swaped_img = cv2.resize(swaped_img,(256,256)) swaped_img = cv2.GaussianBlur(swaped_img, (19,19), sigmaX=0, sigmaY=0, borderType = cv2.BORDER_DEFAULT)

Wouldn't it be possible to make a smooth bounding box before blending it to the original image there? Need help for this because it's very hard to search the net for every function and every error message when something goes wrong.

Here's an example of blur and resize swaped_img frame_1

ExponentialML commented 3 years ago

@ftaker887 Wonderful~!!!!!!!!!!!!!!! But it will indeed add complexity to the original simswap repo by introducing self training mask model. How about this you create a individual repo named "simswap with more accuary mask" or something like that, and I will add a link refer to your repo in the simswap homepage when you get the function implemented.It all depends on you, looking forward to your feedback.

Sure, that seems like a good idea! It may take a bit of time since I have to make sure everything is neat and ready to use.

The model is almost ready (this is an early iteration). It's not perfect, but it works well when it wants to. Hopefully when the training is done it will be a bit better.

temp_masked temp_parsed

frame_0000221 temp_full_frame temp_masked temp_parsed

instant-high commented 3 years ago

Ok. But I was thinking about a simple, fixed mask around the square image that contains the swapped face before blending it to the whole frame. ? Soft border for that square.?

NNNNAI commented 3 years ago

@instant-high I see.But actually, I have been use soft border blending in the original code.You can check it in line 34-35 ./util/reverse2origianl.py. There are some small problem about the code you provided. Firstly, you should not resize the swaped image which will lead to the misalign while blend the swapped image back to whole image. Secondly, you currently smooth the whole swapped image instead of the border which leads to your current oversmooth result inside the bounding box. If you still get confused, please feel free to ask~. Have a nice day~.

instant-high commented 3 years ago

Resize and blur I only did to clarify what I mean. Of course that doesn't make sense. I'll check line 34/35 But why do we see sharp edges when you already did soft blending the whole square?

Thank you

woctezuma commented 3 years ago

But why do we see sharp edges when you already did soft blending the whole square?

Check these images: https://github.com/neuralchen/SimSwap/issues/14#issuecomment-877135637 Or this link for an easier comparison: https://imgsli.com/NjA2MzE


You can see that there is a blur near the edges of the bounding-box, whereas the center with the face remains sharp.

Bounding-box: Bounding-box


The issue is less important with the mask method:

Mask: Mask

NNNNAI commented 3 years ago

Because our current method will make the background more or less different from the original, soft border can only alleviate sharp edge but cannot completely solve sharp edge. So maybe using mask to blend would be a better choice.

instant-high commented 3 years ago

Sorry again. Maybe I don't understand the code completly. What I'm trying to achieve is a mask like this for the whole square image whit the face and the background: that should work even if parts of the face, eg. the chin, are ouside the square,

mask Thank you for your patience

woctezuma commented 3 years ago

https://github.com/neuralchen/SimSwap/blob/fc4b7013547f023223c83097923aa255a8dd05e7/util/reverse2original.py#L12

https://github.com/neuralchen/SimSwap/blob/fc4b7013547f023223c83097923aa255a8dd05e7/util/reverse2original.py#L27

https://github.com/neuralchen/SimSwap/blob/fc4b7013547f023223c83097923aa255a8dd05e7/util/reverse2original.py#L32-L35

NNNNAI commented 3 years ago

@instant-high I see. But if you want to achieve the goal you mentioned, you should do the blur on the image mask instead of on the swapped image. The goal you mentioned have already implemented by the line 34-35 inthe util/reverse2original,py, you can try to tune the kernel size for cv.erode or the iterations to find out if it can get better results.

instant-high commented 3 years ago

I've tried. But increasing kernel size and/or iterations decreases the size of the "bounding box" but whith the same sharp edges

woctezuma commented 3 years ago

It looks to me that you have managed to blur the edges on the picture shown at https://github.com/neuralchen/SimSwap/issues/14#issuecomment-877754868

Otherwise, try to figure out the mask that you want. Here is an example with Gaussian Blur, which works on Colab:

%pip install mediapy
import cv2
import mediapy as media
import numpy as np

image_size = (224, 224)

target_image = media.read_image('https://i.imgur.com/QoZAh6t.png').mean(axis=2)
target_image = media.resize_image(target_image, image_size)

kernel_size = (32, 32)

img_white = np.ones(image_size)
kernel = np.ones(kernel_size)

padding_size = (1, 1)

img_mask = np.pad(img_white, padding_size) # not needed if there is a warping
img_mask = cv2.erode(img_mask, kernel)
img_mask = media.resize_image(img_mask, image_size)

blended_image = img_mask * target_image

media.show_images([target_image, img_mask, blended_image])

blur_size = tuple(2*i+1 for i in kernel_size)

blurred_mask = cv2.GaussianBlur(img_mask, blur_size, 0) 
blended_image = blurred_mask * target_image

media.show_images([target_image, blurred_mask, blended_image])

Output

Anyway, this (figuring out how you exactly want to blur the edges) should be another issue in my opinion. Plus, the mask approach is more potent than the bounding-box approach, as it is based on the face parts.

instant-high commented 3 years ago

I finally added the following 3 lines of code to 'reverse2original.py' from the above example for blurred mask and got the desired result. Kernel value should be 2 x kernel_size for blurring.

img_mask = img_white kernel = np.ones((40,40),np.uint8) img_mask = cv2.erode(img_mask,kernel, iterations=1)

kernel_size = (20, 20) *blur_size = tuple(2i+1 for i in kernel_size) img_mask = cv2.GaussianBlur(img_mask, blur_size, 0)**

Original script with visible bounding box: frame_0

Modified script with blurred mask: Test_40_20

woctezuma commented 3 years ago

Looking good.

NNNNAI commented 3 years ago

@instant-high Great! It did work to make the results much more better.

AlonDan commented 3 years ago

@instant-high That looks REALLY good!! 😮 @NNNNAI Any chance we'll see this soon on the next update of SimSwap?

I would like to test it as well but I'm not a programmer so I can only try this when it will be officially included on the the official .ZIP

instant-high commented 3 years ago

@AlonDan If you run simswap locally on your computer you can insert the above 3 lines yourself by using something like windows notepad... Copy and paste the lines to reverse2original.py starting at line 36. Be aware of the indent. Same as line 35. Then save it by overwriting the original file. In case something goes wrong you could make a backup before Don't forget to change kerne to 40 40 at line 34

AlonDan commented 3 years ago

@AlonDan If you run simswap locally on your computer you can insert the above 3 lines yourself by using something like windows notepad... Copy and paste the lines to reverse2original.py starting at line 36. Be aware of the indent. Same as line 35. Then save it by overwriting the original file. In case something goes wrong you could make a backup before

Thanks @instant-high ! I tried it by putting the code with the 6 lines above

and got some errors:

img_white[img_white>20] =255

        img_mask = img_white
    kernel = np.ones((40,40),np.uint8)
    img_mask = cv2.erode(img_mask,kernel, iterations=1)

    kernel_size = (20, 20)
    blur_size = tuple(2*i+1 for i in kernel_size)
    img_mask = cv2.GaussianBlur(img_mask, blur_size, 0)

img_mask /= 255

and I got this error:

Traceback (most recent call last):
  File "test_video_swapsingle.py", line 12, in <module>
    from util.videoswap import video_swap
  File "Z:\SimSwap\util\videoswap.py", line 8, in <module>
    from util.reverse2original import reverse2wholeimage
  File "Z:\SimSwap\util\reverse2original.py", line 36
    kernel_size = (20, 20)
                         ^
TabError: inconsistent use of tabs and spaces in indentation

Without the 3 other lines (anything with the word KERNEL) it works, so I see the 20 blur gets to 40 but it seems like it also blur the inside so I'm not sure if it's correct. I would love to get the nice results as the example you got above :)

Do I need to install something to make it work? or if possible to share the right file and I'll replace / overwrite mine? (I made a backup)

instant-high commented 3 years ago

TabError: inconsistent use of tabs and spaces in indentation

Don't know how to say in english. That's the indent error I mentionend

Set the three lines to the left border and then put 8x space to get them to the same position as the lines above one by one. It's just a problem of text formatting.

instant-high commented 3 years ago

As an alternative I could send you the modified script....

AlonDan commented 3 years ago

As an alternative I could send you the modified script....

Oh! I just used TABS, now I did manually 8 spaces as you mention and it works! Thank you, it looks REALLY GOOD!

I hope that the next step will be more like whatever magic is done in the examples above so we can get much more accurate masks in general, but I guess it's a totally different thing to handle and probably much more complicated... it will be AWESOME for sure!

NNNNAI commented 3 years ago

@instant-high Your work is great, do you mind I uploaded modified reverseoriginal.py in next simswap updated? I will make an acknowledgement for you in the "News" section.

instant-high commented 3 years ago

Of course you can do that.

bmc84 commented 3 years ago

This is an amazing repo & this is also a fantastic enhancement. I think it's worth pointing out, since I can't see it mentioned anywhere above, that adding the extra code to reverse2original.py adds extra processing time. I assume it's the Gaussian blur adding the extra time, but I have not benchmarked it. All I know is that a clip that used to take 4 minutes now takes 6 minutes, so it's a noticeable speed decrease (worth it, in my opinion). I'm using Anaconda + Windows 10 + RTX 3070. Feel free to do your own tests or to see if there's a cheaper way of getting the same effect. Thanks again for everything :)

NNNNAI commented 3 years ago

Hey guys, I have been update the simswap. All the example command lines from Inference for image or video face swapping are using mask(Better visual effects). Please check it in for details. Please don’t forget to go to Preparation to check the latest set up. Have a nice day~.

NNNNAI commented 3 years ago

Colab is not ready now, I will let you know when it got done.

bmc84 commented 3 years ago

Thank you for the updates! I have tested --use_mask true & false, and "false" still gives better output for my use cases. use_mask = true gives a very obvious line through the chin, which is not present for use_mask = false (thanks to the new blur, mentioned in previous comments). Can anyone suggest which code to add to only blur the bottom of the changed image (ie, the chin)? I would like to use the mask but also blur the chin to remove the line. I can do this in post-processing (ie After Effects) if not.

Also, will it ever be possible to replace the full face - including the full chin?

Thank you again.

NNNNAI commented 3 years ago

@bmc84 Could you share the picture which you found there is a very obvious line through the chin? It will help me to locate the problem, many thanks~.