neuralchen / SimSwap

An arbitrary face-swapping framework on images and videos with one single trained model!
Other
4.55k stars 895 forks source link

Advice on quality of output(not an issue) #259

Open Fibonacci134 opened 2 years ago

Fibonacci134 commented 2 years ago

In case someone overlooked it, you can preserve quality a lot better by, changing your temp_directory files to png which is lossless, instead of jpg which is designed for compression and definitely loses quality. The folder will quickly grow in size because of the non compression nature of png, but you can always delete the folder once you get your output video.

pidgeon777 commented 2 years ago

Interesting. How could you actually implement this? I mean the necessary code modifications to make use of PNG instead.

Fibonacci134 commented 2 years ago

Its quite simple, go to videoswap.py and on the bottom where you see jpg , just replace it with png. The difference is quite noticable. You can also set up moviepy to not compress the video file by exporting it as a .mov file which is also lossless no matter how many times you edit it, it will never lose quality. I will upload the altered files to my github today, and give some instructions on a few things, like when the image is "not iterable" and what you can do to get passed that. Also how to use Gfgpan to quickly enhance the faces to hd and recompile the video at a higher resolution.

pidgeon777 commented 2 years ago

@Fibonacci134 yours would be a tremendous contribution and I will eagerly await it.

Many thanks in advance 👍.

Fibonacci134 commented 2 years ago

No problem at all! i uploaded the videoswap.py and i will be posting a tutorial on how to use GFPGAN to enhance. will also post reverse2original.py that is a bit faster at inference im still working on it so it is still a work in progress.

razr112 commented 2 years ago

So yesterday after reading this post I downloaded GFPGAN to try it out. All I can say is... Wow. The results are incredible. Thanks for the tip.

pidgeon777 commented 2 years ago

@razr112 how does GFPGAN handle multiple faces in a picture?

Also, what if the face is not looking at the camera, but for example to the left or the right?

@Fibonacci134 when you'll have tested your improved output quality code and if you would'nt mind sharing it, please let us kindly know 👍.

Fibonacci134 commented 2 years ago

Yes GFGPAN is amazing!! quick tip, if you want to just enhance the face (its a lot quicker) you can add --bg_upsampler none and it'll only do the faces. you can also have it save to only "restored imgs" by altering the inference.py file to convert an entire video into sequential images :

ffmpeg -i (name of your video).mp4 -vf fps=25 out%d.png

convert back to video once images are enhanced :

ffmpeg -f image2 -framerate 25 -i %0d.png -vcodec libx264 -crf 22 video.mp4

model v 1.3 is really good but I prefer v 1

razr112 commented 2 years ago

Where exactly do I place that code in the inference.py file?

@Fibonacci134

Fibonacci134 commented 2 years ago

Where exactly do I place that code in the inference.py file?

@Fibonacci134

Hey that code isn't meant to be put in inference.py , its just normal ffmpeg command. To make things simple, all you have to do is install FFmpeg and add the FFmpeg program to the Windows path using Environment variables. If using linux, then no need to do that, it will work without any other effort. The steps are as follows: -Put video in a empty folder -right click inside folder and open in terminal

Now that you've mentioned it, i think i will add the script into the Gfgpan script to automate the process. Give me a couple of days lol

razr112 commented 2 years ago

Ah okay. I already use FFmpeg commands to extract the frames. I just couldn't figure out how to implement it into the inference file.

Now that you've mentioned it, i think i will add the script into the Gfgpan script to automate the process. Give me a couple of days lol

Awesome! Looking forward to it. Implementing the script is way above my level of beginner coding knowledge lol.

@Fibonacci134

Fibonacci134 commented 2 years ago

Ohh okay, awesome. Lol ots just that there are generally more windows users and most are usually not too familiar with ffmpeg. Its great that you use it, such a handy little tool. And no worries bro, we're all beginners in grand scheme of things lol im Guessing you too are a self teacher so you know there's never any structure, we just try and learn things as they come across 😂. Will hopefully get to work on some stuff this weekend.

cheetahfightfx commented 2 years ago

I use ffmpeg and gfpgan to get better output and the difference is just legendary. But GFPGAN is a resource intensive and slow algorithm, which is the only con that I have found.

epicstar7 commented 2 years ago

Implementing GFPGAN or GPEN into the main swap pipeline would be an amazing improvement of this repo. If this could be implemented in a similar way to dot, it would be easy to use as an option before swapping. --gpen_type 512 for instance.

Are there any plans to implement something like this to SimSwap?

fitzgeraldja commented 2 years ago

@epicstar7 you can include gfpgan cleaning as a step only on the masked results, for minimal overhead cost, by adding just a few cells to the colab, at least for single face swaps in a video (I haven't tried for multi case but should be relatively straightforward to extend) - first get all the necessary packages:

# Clone GFPGAN and enter the GFPGAN folder
%cd /content
!rm -rf GFPGAN
!git clone https://github.com/TencentARC/GFPGAN.git
%cd GFPGAN

# Set up the environment
# Install basicsr - https://github.com/xinntao/BasicSR
# We use BasicSR for both training and inference
!BASICSR_JIT='True' BASICSR_EXT=True pip install basicsr
# Install facexlib - https://github.com/xinntao/facexlib
# We use face detection and face restoration helper in the facexlib package
!pip install facexlib
# Install other depencencies
!pip install -r requirements.txt
!python setup.py develop
!pip install realesrgan  # used for enhancing the background (non-face) regions
# Download the pre-trained model
# !wget https://github.com/TencentARC/GFPGAN/releases/download/v0.2.0/GFPGANCleanv1-NoCE-C2.pth -P experiments/pretrained_models
# Now we use the V1.3 model for the demo
!wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth -P experiments/pretrained_models

then write a new version of videoswap to use - the version below also saves frames as lossless pngs rather than jpg which can improve the quality of results at the cost of requiring additional memory

%%writefile /content/SimSwap/util/videoswap_gfpgan.py
import os 
import torch
from torchvision.transforms.functional import normalize
from torchvision.transforms import Resize, ToTensor, Normalize, Compose
from basicsr.utils import tensor2img

gfpgan_transform_upsample = Compose([
    Resize([int(512), int(512)]),
    # ToTensor(),
    Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])  

def gfpgan_downsampler(crop_size): 
  return Compose([
    Resize([int(crop_size), int(crop_size)]),
    # ToTensor(),
    # Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])  

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

os.chdir('/content/GFPGAN')
from gfpgan.archs.gfpganv1_clean_arch import GFPGANv1Clean

arch = 'clean'
channel_multiplier = 2
model_name = 'GFPGANv1.3'
model_path = os.path.join('experiments/pretrained_models', model_name + '.pth')

if arch == 'clean':
  gfpgan = GFPGANv1Clean(
      out_size=512,
      num_style_feat=512,
      channel_multiplier=channel_multiplier,
      decoder_load_path=None,
      fix_decoder=False,
      num_mlp=8,
      input_is_latent=True,
      different_w=True,
      narrow=1,
      sft_half=True)

loadnet = torch.load(model_path)
if 'params_ema' in loadnet:
    keyname = 'params_ema'
else:
    keyname = 'params'
gfpgan.load_state_dict(loadnet[keyname], strict=True)
gfpgan.eval()
gfpgan = gfpgan.to(device)

os.chdir('/content/SimSwap')

'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-24 19:19:52
Description: 
'''
import os 
import cv2
import glob
import torch
import shutil
import numpy as np
from tqdm import tqdm
from util.reverse2original import reverse2wholeimage
import moviepy.editor as mp
from moviepy.editor import AudioFileClip, VideoFileClip 
from moviepy.video.io.ImageSequenceClip import ImageSequenceClip
import  time
from util.add_watermark import watermark_image
from util.norm import SpecificNorm
from parsing_model.model import BiSeNet

def video_swap(video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo = False,use_mask =False):
    video_forcheck = VideoFileClip(video_path)
    if video_forcheck.audio is None:
        no_audio = True
    else:
        no_audio = False

    del video_forcheck

    if not no_audio:
        video_audio_clip = AudioFileClip(video_path)

    video = cv2.VideoCapture(video_path)
    logoclass = watermark_image('./simswaplogo/simswaplogo.png')
    ret = True
    frame_index = 0

    frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))

    # video_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))

    # video_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))

    fps = video.get(cv2.CAP_PROP_FPS)
    if  os.path.exists(temp_results_dir):
            shutil.rmtree(temp_results_dir)

    spNorm =SpecificNorm()
    if use_mask:
        n_classes = 19
        net = BiSeNet(n_classes=n_classes)
        net.cuda()
        save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
        net.load_state_dict(torch.load(save_pth))
        net.eval()
    else:
        net =None

    def _totensor(array):
        tensor = torch.from_numpy(array)
        img = tensor.transpose(0, 1).transpose(0, 2).contiguous()
        return img.float().div(255)

    gfpgan_transform_downsample = gfpgan_downsampler(crop_size)

    def gfpgan_enhance(img_tensor): 
      # gfp_t = normalize(img_tensor, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=False)
      # gfp_t = gfp_t.unsqueeze(0).to(device)
      img_tensor = gfpgan_transform_upsample(img_tensor.unsqueeze(0))
      output = gfpgan(img_tensor,return_rgb=True, weight=0.5)[0]
      # restored_face = tensor2img(output.squeeze(0), rgb2bgr=False, min_max=(-1, 1))
      down_img = tensor2img(gfpgan_transform_downsample(output).squeeze(0), rgb2bgr=True, min_max=(-1, 1))[:,:,[2,1,0]]
      return _totensor(down_img)

    # while ret:
    for frame_index in tqdm(range(frame_count)): 
        ret, frame = video.read()
        if  ret:
            detect_results = detect_model.get(frame,crop_size)

            if detect_results is not None:
                # print(frame_index)
                if not os.path.exists(temp_results_dir):
                        os.mkdir(temp_results_dir)
                frame_align_crop_list = detect_results[0]
                frame_mat_list = detect_results[1]
                swap_result_list = []
                frame_align_crop_tenor_list = []
                for frame_align_crop in frame_align_crop_list:

                    # BGR TO RGB
                    # frame_align_crop_RGB = frame_align_crop[...,::-1]

                    frame_align_crop_tenor = _totensor(cv2.cvtColor(frame_align_crop,cv2.COLOR_BGR2RGB))[None,...].cuda()

                    swap_result = swap_model(None, frame_align_crop_tenor, id_vetor, None, True)[0]
                    # print(swap_result.shape)
                    # input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
                    # img = cv2.resize(img, (512, 512))
                    # restore faces and background if necessary

                    # these steps I think are identical to before
                    # cropped_face_t = img2tensor(cropped_face / 255., bgr2rgb=True, float32=True)
                    # normalize(cropped_face_t, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True)
                    # cropped_face_t = cropped_face_t.unsqueeze(0).to(self.device)
                    # output = self.gfpgan(cropped_face_t, return_rgb=False, weight=weight)[0]
                    # # convert to image
                    # restored_face = tensor2img(output.squeeze(0), rgb2bgr=True, min_max=(-1, 1))
                    swap_result = gfpgan_enhance(swap_result)
                    # print(swap_result.shape)

                    cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)), frame)
                    swap_result_list.append(swap_result)
                    frame_align_crop_tenor_list.append(frame_align_crop_tenor)

                reverse2wholeimage(frame_align_crop_tenor_list,swap_result_list, frame_mat_list, crop_size, frame, logoclass,\
                    os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)),no_simswaplogo,pasring_model =net,use_mask=use_mask, norm = spNorm)

            else:
                if not os.path.exists(temp_results_dir):
                    os.mkdir(temp_results_dir)
                frame = frame.astype(np.uint8)
                if not no_simswaplogo:
                    frame = logoclass.apply_frames(frame)
                cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)), frame)
        else:
            break

    video.release()

    # image_filename_list = []
    path = os.path.join(temp_results_dir,'*.png')
    image_filenames = sorted(glob.glob(path))

    clips = ImageSequenceClip(image_filenames,fps = fps)

    if not no_audio:
        clips = clips.set_audio(video_audio_clip)

    clips.write_videofile(save_path,audio_codec='aac')

then finally run a slightly modified version of the original script, where this also removes the watermark placed by default, and lets you choose whether to use the 224 or 512 crop size version of the model (NB if choosing 512, you will also need to add a !wget https://github.com/neuralchen/SimSwap/releases/download/512_beta/512.zip !unzip ./512.zip -d ./checkpoints line above the original checkpoint download)

%cd /content/SimSwap
import cv2
import torch
import fractions
import numpy as np
from PIL import Image
import torch.nn.functional as F
from torchvision import transforms
from models.models import create_model
from options.test_options import TestOptions
from insightface_func.face_detect_crop_single import Face_detect_crop
from util.videoswap_gfpgan import video_swap
import os

def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0

transformer = transforms.Compose([
        transforms.ToTensor(),
        #transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

transformer_Arcface = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

opt = TestOptions()
opt.initialize()
opt.parser.add_argument('-f') ## dummy arg to avoid bug
opt = opt.parse()
opt.pic_a_path = '/path/to/desired/face.png' ## or replace it with image from your own google drive
opt.video_path = '/path/to/target/video.mp4' ## or replace it with video from your own google drive
opt.output_path = '/path/to/output/video.mp4'
opt.temp_path = './tmp'
opt.Arc_path = './arcface_model/arcface_checkpoint.tar'
opt.isTrain = False
opt.use_mask = True  ## new feature up-to-date
opt.no_simswaplogo = True

start_epoch, epoch_iter = 1, 0
crop_sizes = [224,512]
opt.crop_size = crop_sizes[0]
crop_size = opt.crop_size

torch.nn.Module.dump_patches = True
if crop_size == 512:
    opt.which_epoch = 550000
    opt.name = '512'
    mode = 'ffhq'
else:
    mode = 'None'

model = create_model(opt)
model.eval()

app = Face_detect_crop(name='antelope', root='./insightface_func/models')
app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640),mode=mode)
with torch.no_grad():
    pic_a = opt.pic_a_path
    # img_a = Image.open(pic_a).convert('RGB')
    img_a_whole = cv2.imread(pic_a)
    img_a_align_crop, _ = app.get(img_a_whole,crop_size)
    img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB)) 
    img_a = transformer_Arcface(img_a_align_crop_pil)
    img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2])

    # pic_b = opt.pic_b_path
    # img_b_whole = cv2.imread(pic_b)

    # img_b_align_crop, b_mat = app.get(img_b_whole,crop_size)
    # img_b_align_crop_pil = Image.fromarray(cv2.cvtColor(img_b_align_crop,cv2.COLOR_BGR2RGB)) 
    # img_b = transformer(img_b_align_crop_pil)
    # img_att = img_b.view(-1, img_b.shape[0], img_b.shape[1], img_b.shape[2])

    # convert numpy to tensor
    img_id = img_id.cuda()
    # img_att = img_att.cuda()

    #create latent id
    img_id_downsample = F.interpolate(img_id, size=(112,112))
    latend_id = model.netArc(img_id_downsample)
    latend_id = F.normalize(latend_id, p=2, dim=1)

    video_swap(opt.video_path, latend_id, model, app, opt.output_path,temp_results_dir=opt.temp_path,\
        no_simswaplogo=opt.no_simswaplogo,use_mask=opt.use_mask,crop_size=crop_size)

hope that helps!

DrBlou commented 2 years ago

@fitzgeraldja Oh my god man ! You're my hero, thank you

zecretaccount commented 1 year ago

@fitzgeraldja This is really awesome. I can't get this version to work on my pc though, only https://github.com/mike9251/simswap-inference-pytorch any chance you would fix the code for that repo? :)

ziko2222 commented 1 year ago

Its quite simple, go to videoswap.py and on the bottom where you see jpg , just replace it with png.

I did that then I got this error:

Traceback (most recent call last): File "test_video_swapsingle.py", line 86, in no_simswaplogo=opt.no_simswaplogo,use_mask=opt.use_mask,crop_size=crop_size) File "C:\SimSwap\SimSwap-main\util\videoswap.py", line 115, in video_swap clips = ImageSequenceClip(image_filenames,fps = fps) File "C:\Users\amt\anaconda3\envs\simswap\lib\site-packages\moviepy\video\io\ImageSequenceClip.py", line 64, in init if isinstance(sequence[0], str): IndexError: list index out of range

Solenyalyl commented 9 months ago

@ziko2222 ,have you solved it, I have received the same problem