neuralchen / SimSwap

An arbitrary face-swapping framework on images and videos with one single trained model!
Other
4.52k stars 892 forks source link

I made a little research and suggestions for code compatibility with the new training model #292

Closed netrunner-exe closed 2 years ago

netrunner-exe commented 2 years ago

Hi all. I did a little research in order to make the test code compatible with the new training model. I really hope that @neuralchen or @NNNNAI based on this research will make the necessary adaptation of the code in the repository to make everything work perfectly! Also many thanks to @boreas-l for the idea and hints on how to implement it. Some points were not able to make it work, please improve it to work properly!

  1. Create a new option for compatibility with old checkpoints. I will not write all the details, I will just give little explanations and post the finished code with changes

SimSwap/options/test_options.py

'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-23 17:08:08
Description: 
'''
from .base_options import BaseOptions

def str2bool(v):
    if isinstance(v, bool):
        return v
    if v.lower() in ('yes', 'true', 't', 'y', '1'):
        return True
    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
        return False

class TestOptions(BaseOptions):
    def initialize(self):
        BaseOptions.initialize(self)
        self.parser.add_argument('--ntest', type=int, default=float("inf"), help='# of test examples.')
        self.parser.add_argument('--results_dir', type=str, default='./results/', help='saves results here.')
        self.parser.add_argument('--aspect_ratio', type=float, default=1.0, help='aspect ratio of result images')
        self.parser.add_argument('--phase', type=str, default='test', help='train, val, test, etc')
        self.parser.add_argument('--which_epoch', type=str, default='latest', help='which epoch to load? set to latest to use latest cached model')
        self.parser.add_argument('--how_many', type=int, default=50, help='how many test images to run')       
        self.parser.add_argument('--cluster_path', type=str, default='features_clustered_010.npy', help='the path for clustered results of encoded features')
        self.parser.add_argument('--use_encoded_image', action='store_true', help='if specified, encode the real image to get the feature map')
        self.parser.add_argument("--export_onnx", type=str, help="export ONNX model to a given file")
        self.parser.add_argument("--engine", type=str, help="run serialized TRT engine")
        self.parser.add_argument("--onnx", type=str, help="run ONNX model via TRT")        
        self.parser.add_argument("--Arc_path", type=str, default='models/BEST_checkpoint.tar', help="run ONNX model via TRT")
        self.parser.add_argument("--pic_a_path", type=str, default='./crop_224/gdg.jpg', help="Person who provides identity information")
        self.parser.add_argument("--pic_b_path", type=str, default='./crop_224/zrf.jpg', help="Person who provides information other than their identity")
        self.parser.add_argument("--pic_specific_path", type=str, default='./crop_224/zrf.jpg', help="The specific person to be swapped")
        self.parser.add_argument("--multisepcific_dir", type=str, default='./demo_file/multispecific', help="Dir for multi specific")
        self.parser.add_argument("--video_path", type=str, default='./demo_file/multi_people_1080p.mp4', help="path for the video to swap")
        self.parser.add_argument("--temp_path", type=str, default='./temp_results', help="path to save temporarily images")
        self.parser.add_argument("--output_path", type=str, default='./output/', help="results path")
        self.parser.add_argument('--id_thres', type=float, default=0.03, help='how many test images to run')
        self.parser.add_argument('--no_simswaplogo', action='store_true', help='Remove the watermark')
        self.parser.add_argument('--use_mask', action='store_true', help='Use mask for better result')
        self.parser.add_argument('--crop_size', type=int, default=224, help='Crop of size of input image')
        self.parser.add_argument('--new_model', type=str2bool, default=False, const=False, nargs='?', help='Use new pretrained model')
        self.parser.add_argument('--Gdeep', type=str2bool, default=False)
        self.isTrain = False
  1. Create a new file with the necessary functions SimSwap/util/swap_new_model.py
# -*- coding: utf-8 -*-
# @Author: netrunner-exe
# @Date:   2022-07-01 13:45:41
# @Last Modified by:   netrunner-exe
# @Last Modified time: 2022-07-01 13:47:06
import cv2
import numpy as np
import torch
from PIL import Image
from torchvision import transforms

def img2tensor(imgs, bgr2rgb=True, float32=True):
    """Numpy array to tensor.
    Args:
        imgs (list[ndarray] | ndarray): Input images.
        bgr2rgb (bool): Whether to change bgr to rgb.
        float32 (bool): Whether to change to float32.
    Returns:
        list[tensor] | tensor: Tensor images. If returned results only have
            one element, just return tensor.
    """

    def _totensor(img, bgr2rgb, float32):
        if img.shape[2] == 3 and bgr2rgb:
            if img.dtype == 'float64':
                img = img.astype('float32')
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = torch.from_numpy(img.transpose(2, 0, 1))
        if float32:
            img = img.float()
        return img

    if isinstance(imgs, list):
        return [_totensor(img, bgr2rgb, float32) for img in imgs]
    else:
        return _totensor(imgs, bgr2rgb, float32)

def swap_result_new_model(face_align_crop, model, latend_id):
    img_align_crop = Image.fromarray(cv2.cvtColor(face_align_crop, cv2.COLOR_BGR2RGB))

    img_tensor = transforms.ToTensor()(img_align_crop)
    img_tensor = img_tensor.view(-1, 3, img_align_crop.size[0], img_align_crop.size[1])

    mean = torch.tensor([0.485, 0.456, 0.406]).cuda().view(1, 3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).cuda().view(1, 3, 1, 1)

    img_tensor = img_tensor.cuda(non_blocking=True)
    img_tensor = img_tensor.sub_(mean).div_(std)

    imagenet_std = torch.Tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
    imagenet_mean = torch.Tensor([0.485, 0.456, 0.406]).view(3, 1, 1)

    swap_res = model.netG(img_tensor, latend_id).cpu()
    swap_res = (swap_res * imagenet_std + imagenet_mean).numpy()
    swap_res = swap_res.squeeze(0).transpose((1, 2, 0))

    swap_result = np.clip(255*swap_res, 0, 255)
    swap_result = img2tensor(swap_result / 255., bgr2rgb=False, float32=True)
    return swap_result
  1. Unfortunately for multispecific and swapspecific I could not make it work. I will take test_wholeimage_swapsingle.py as an example. Making small changes to work with the new model and compatibility with the old ones. The only point: if you are using the beta 512 model, you will need to add --name 512 instead of only --crop_size 512to make the beta 512 model work in the future.
    
    '''
    Author: Naiyuan liu
    Github: https://github.com/NNNNAI
    Date: 2021-11-23 17:03:58
    LastEditors: Naiyuan liu
    LastEditTime: 2021-11-24 19:19:43
    Description: 
    '''
    import cv2
    import torch
    import fractions
    import numpy as np
    from PIL import Image
    import torch.nn.functional as F
    from torchvision import transforms
    from models.models import create_model
    from models.projected_model import fsModel
    from options.test_options import TestOptions
    from insightface_func.face_detect_crop_single import Face_detect_crop
    from util.reverse2original import reverse2wholeimage
    from util.swap_new_model import swap_result_new_model
    import os
    from util.add_watermark import watermark_image
    from util.norm import SpecificNorm
    from parsing_model.model import BiSeNet

def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0

transformer_Arcface = transforms.Compose([ transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ])

def _totensor(array): tensor = torch.from_numpy(array) img = tensor.transpose(0, 1).transpose(0, 2).contiguous() return img.float().div(255)

if name == 'main': opt = TestOptions().parse() start_epoch, epoch_iter = 1, 0 crop_size = opt.crop_size

torch.nn.Module.dump_patches = True
if crop_size == 512:
  if opt.name == str(512):
    opt.which_epoch = 550000
  else:
    opt.Gdeep = True
    opt.new_model = True

  mode = 'ffhq'
else:
  mode = 'None'

logoclass = watermark_image('./simswaplogo/simswaplogo.png')

if opt.new_model == True:
    model = fsModel()
    model.initialize(opt)
    model.netG.eval()
else:            
    model = create_model(opt)
    model.eval()

spNorm = SpecificNorm()
app = Face_detect_crop(name='antelope', root='./insightface_func/models')
app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640), mode=mode)

with torch.no_grad():
    pic_a = opt.pic_a_path
    img_a_whole = cv2.imread(pic_a)
    img_a_align_crop, _ = app.get(img_a_whole,crop_size)
    img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB))

    img_a = transformer_Arcface(img_a_align_crop_pil)
    img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2])

    # convert numpy to tensor
    img_id = img_id.cuda()

    #create latent id
    img_id_downsample = F.interpolate(img_id, size=(112,112))
    latend_id = model.netArc(img_id_downsample)
    latend_id = F.normalize(latend_id, p=2, dim=1)

    ############## Forward Pass ######################

    pic_b = opt.pic_b_path
    img_b_whole = cv2.imread(pic_b)

    img_b_align_crop_list, b_mat_list = app.get(img_b_whole, crop_size)
    # detect_results = None

    swap_result_list = []
    b_align_crop_tenor_list = []

    for b_align_crop in img_b_align_crop_list:
        b_align_crop_tenor = _totensor(cv2.cvtColor(b_align_crop[0], cv2.COLOR_BGR2RGB))[None,...].cuda()

        if opt.new_model == True:
          swap_result = swap_result_new_model(b_align_crop, model, latend_id)
        else:
          swap_result = model(None, b_align_crop_tenor, latend_id, None, True)[0]

        swap_result_list.append(swap_result)
        b_align_crop_tenor_list.append(b_align_crop_tenor)

    if opt.use_mask:
        n_classes = 19
        net = BiSeNet(n_classes=n_classes)
        net.cuda()
        save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
        net.load_state_dict(torch.load(save_pth))
        net.eval()
    else:
        net = None

    reverse2wholeimage(b_align_crop_tenor_list, swap_result_list, b_mat_list, crop_size, img_b_whole, logoclass, \
        os.path.join(opt.output_path, 'result_whole_swapsingle.jpg'), opt.no_simswaplogo, pasring_model=net, use_mask=opt.use_mask, norm=spNorm)

    print(' ')
    print('************ Done ! ************')

4. To work with video - `test_video_swapsingle.py`

''' Author: Naiyuan liu Github: https://github.com/NNNNAI Date: 2021-11-23 17:03:58 LastEditors: Naiyuan liu LastEditTime: 2021-11-24 19:00:38 Description: ''' import cv2 import torch import fractions import numpy as np from PIL import Image import torch.nn.functional as F from torchvision import transforms from models.models import create_model from models.projected_model import fsModel from options.test_options import TestOptions from insightface_func.face_detect_crop_single import Face_detect_crop from util.videoswap import video_swap import os

def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0

transformer = transforms.Compose([ transforms.ToTensor(),

transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

])

transformer_Arcface = transforms.Compose([ transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ])

detransformer = transforms.Compose([

transforms.Normalize([0, 0, 0], [1/0.229, 1/0.224, 1/0.225]),

transforms.Normalize([-0.485, -0.456, -0.406], [1, 1, 1])

])

if name == 'main': opt = TestOptions().parse() start_epoch, epoch_iter = 1, 0 crop_size = opt.crop_size

torch.nn.Module.dump_patches = True
if crop_size == 512:
  if opt.name == str(512):
    opt.which_epoch = 550000
  else:
    opt.Gdeep = True
    opt.new_model = True

  mode = 'ffhq'
else:
  mode = 'None'

if opt.new_model == True:
    model = fsModel()
    model.initialize(opt)
    model.netG.eval()
else:            
    model = create_model(opt)
    model.eval()

app = Face_detect_crop(name='antelope', root='./insightface_func/models')
app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640),mode=mode)

with torch.no_grad():
    pic_a = opt.pic_a_path
    # img_a = Image.open(pic_a).convert('RGB')
    img_a_whole = cv2.imread(pic_a)
    img_a_align_crop, _ = app.get(img_a_whole,crop_size)
    img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB)) 
    img_a = transformer_Arcface(img_a_align_crop_pil)
    img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2])

    # pic_b = opt.pic_b_path
    # img_b_whole = cv2.imread(pic_b)
    # img_b_align_crop, b_mat = app.get(img_b_whole,crop_size)
    # img_b_align_crop_pil = Image.fromarray(cv2.cvtColor(img_b_align_crop,cv2.COLOR_BGR2RGB)) 
    # img_b = transformer(img_b_align_crop_pil)
    # img_att = img_b.view(-1, img_b.shape[0], img_b.shape[1], img_b.shape[2])

    # convert numpy to tensor
    img_id = img_id.cuda()
    # img_att = img_att.cuda()

    #create latent id
    img_id_downsample = F.interpolate(img_id, size=(112,112))
    latend_id = model.netArc(img_id_downsample)
    latend_id = F.normalize(latend_id, p=2, dim=1)

    video_swap(opt.video_path, latend_id, model, app, opt.output_path, temp_results_dir=opt.temp_path,\
        no_simswaplogo=opt.no_simswaplogo, use_mask=opt.use_mask, crop_size=crop_size, new_model=opt.new_model)

**and `videoswap.py`**

''' Author: Naiyuan liu Github: https://github.com/NNNNAI Date: 2021-11-23 17:03:58 LastEditors: Naiyuan liu LastEditTime: 2021-11-24 19:19:52 Description: ''' import os import cv2 import glob import torch import shutil import numpy as np from tqdm import tqdm from util.reverse2original import reverse2wholeimage import moviepy.editor as mp from moviepy.editor import AudioFileClip, VideoFileClip from moviepy.video.io.ImageSequenceClip import ImageSequenceClip import time from util.add_watermark import watermark_image from util.norm import SpecificNorm from util.swap_new_model import swap_result_new_model from parsing_model.model import BiSeNet

def _totensor(array): tensor = torch.from_numpy(array) img = tensor.transpose(0, 1).transpose(0, 2).contiguous() return img.float().div(255)

def video_swap(video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo=False, use_mask=False, new_model=False): video_forcheck = VideoFileClip(video_path) if video_forcheck.audio is None: no_audio = True else: no_audio = False

del video_forcheck

if not no_audio:
    video_audio_clip = AudioFileClip(video_path)

video = cv2.VideoCapture(video_path)
logoclass = watermark_image('./simswaplogo/simswaplogo.png')
ret = True
frame_index = 0

frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))

# video_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))

# video_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))

fps = video.get(cv2.CAP_PROP_FPS)
if  os.path.exists(temp_results_dir):
        shutil.rmtree(temp_results_dir)

spNorm = SpecificNorm()
if use_mask:
    n_classes = 19
    net = BiSeNet(n_classes=n_classes)
    net.cuda()
    save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
    net.load_state_dict(torch.load(save_pth))
    net.eval()
else:
    net = None

# while ret:
for frame_index in tqdm(range(frame_count)): 
    ret, frame = video.read()
    if  ret:
        detect_results = detect_model.get(frame,crop_size)

        if detect_results is not None:
            # print(frame_index)
            if not os.path.exists(temp_results_dir):
                    os.mkdir(temp_results_dir)
            frame_align_crop_list = detect_results[0]
            frame_mat_list = detect_results[1]
            swap_result_list = []
            frame_align_crop_tenor_list = []
            for frame_align_crop in frame_align_crop_list:

                # BGR TO RGB
                # frame_align_crop_RGB = frame_align_crop[...,::-1]

                frame_align_crop_tenor = _totensor(cv2.cvtColor(frame_align_crop,cv2.COLOR_BGR2RGB))[None,...].cuda()

                if new_model == True:
                    swap_result = swap_result_new_model(frame_align_crop, swap_model, id_vetor)
                else:
                    swap_result = swap_model(None, frame_align_crop_tenor, id_vetor, None, True)[0]

                cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)), frame)
                swap_result_list.append(swap_result)
                frame_align_crop_tenor_list.append(frame_align_crop_tenor)

            reverse2wholeimage(frame_align_crop_tenor_list,swap_result_list, frame_mat_list, crop_size, frame, logoclass,\
                os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)),no_simswaplogo,pasring_model =net,use_mask=use_mask, norm = spNorm)

        else:
            if not os.path.exists(temp_results_dir):
                os.mkdir(temp_results_dir)
            frame = frame.astype(np.uint8)
            if not no_simswaplogo:
                frame = logoclass.apply_frames(frame)
            cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)), frame)
    else:
        break

video.release()

# image_filename_list = []
path = os.path.join(temp_results_dir,'*.jpg')
image_filenames = sorted(glob.glob(path))

clips = ImageSequenceClip(image_filenames,fps = fps)

if not no_audio:
    clips = clips.set_audio(video_audio_clip)

clips.write_videofile(save_path,audio_codec='aac')


Next as reference i took 512 checkpoint that was posted by @mittalgovind [Link](https://github.com/neuralchen/SimSwap/issues/255#issuecomment-1118983049). It has 390000 it. Next in **checkpoints** folder I created a folder **simswap_512_test** and copied nessesary files to the root of this folder

Full example of the command:
For video:
`python test_video_swapsingle.py --which_epoch 390000 --new_model True --checkpoints_dir './checkpoints/simswap_512_test' --isTrain false --crop_size 512 --Arc_path arcface_model/arcface_checkpoint.tar --pic_a_path ./demo_file/Iron_man.jpg --video_path ./demo_file/multi_people_1080p.mp4 --output_path ./output/multi_test_swapsingle.mp4 --temp_path ./temp_results --no_simswaplogo --use_mask`

For image:
`python test_wholeimage_swapsingle.py --which_epoch 390000 --new_model True --checkpoints_dir './checkpoints/simswap_512_test' --Arc_path arcface_model/arcface_checkpoint.tar --pic_a_path ./demo_file/Iron_man.jpg --pic_b_path ./demo_file/multi_people.jpg --output_path ./output  --isTrain false --crop_size 512 --use_mask --no_simswaplogo`

**All these explanations are for people who have at least a little experience in modifying SimSwap files. Please check this code and examples carefully, maybe I made a typo somewhere.**

**Results:**
![res](https://user-images.githubusercontent.com/81887288/176904152-0e33fd03-5151-46ac-abb7-2314c388c1db.jpg)
Also if you change` mode = 'ffhq'` to `mode = 'None'` in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural
![res(1)](https://user-images.githubusercontent.com/81887288/176904391-e8a4aac5-d272-4e12-8e5c-a5f3b14a2263.jpg)
![result_whole_swapsingle(4)](https://user-images.githubusercontent.com/81887288/176904613-f856b0a8-bdb3-4157-a9a5-f3494d90b169.jpg)
![frame_0000000(1)](https://user-images.githubusercontent.com/81887288/176904666-0d80e847-db9e-433b-911d-6dfd35c652bd.jpg)
![frame_0000000](https://user-images.githubusercontent.com/81887288/176904710-c7f94791-5791-449b-9230-f855fbea9ff1.jpg)

https://user-images.githubusercontent.com/81887288/176904847-ba3b71f1-b1d3-4208-94b3-98361c0eaac2.mp4

https://user-images.githubusercontent.com/81887288/176905239-096e33a5-6d80-457a-903b-9e3fb92e302f.mp4
zwang970201 commented 2 years ago

My output looks like this way, do you meet similar problems? result_whole_swapsingle

LAFLAMIE1024 commented 2 years ago

THANK YOU SO MUCH FOR PROVIDING THESE CODES !!

BbChip0103 commented 2 years ago

Really thank you for your great work! It works well

renmengyuan commented 1 year ago

Hi, as you said, _Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_videoswapsingle. It looks more natural.

I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ?

netrunner-exe commented 1 year ago

Hi, as you said, _Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_videoswapsingle. It looks more natural.

I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ?

I don't think I said anywhere that I used ffhq_face_aligned to train the model (as you say). Moreover, I didn't train the model at all, but used a model that another user posted for the test. In this case, 'none' or 'ffhq' means mode – exactly how to crop and align the face before sending it to further processing. Which mode to use depends on how you cropped and aligned the dataset on which the model was trained.

renmengyuan commented 1 year ago

Hi, as you said, _Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_videoswapsingle. It looks more natural. I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ?

I don't think I said anywhere that I used ffhq_face_aligned to train the model (as you say). Moreover, I didn't train the model at all, but used a model that another user posted for the test. In this case, 'none' or 'ffhq' means mode – exactly how to crop and align the face before sending it to further processing. Which mode to use depends on how you cropped and aligned the dataset on which the model was trained.

I see, maybe the model that another user posted was trained using arc_face_align.