Open Fibonacci134 opened 2 years ago
Interesting. How could you actually implement this? I mean the necessary code modifications to make use of PNG instead.
Its quite simple, go to videoswap.py and on the bottom where you see jpg , just replace it with png. The difference is quite noticable. You can also set up moviepy to not compress the video file by exporting it as a .mov file which is also lossless no matter how many times you edit it, it will never lose quality. I will upload the altered files to my github today, and give some instructions on a few things, like when the image is "not iterable" and what you can do to get passed that. Also how to use Gfgpan to quickly enhance the faces to hd and recompile the video at a higher resolution.
@Fibonacci134 yours would be a tremendous contribution and I will eagerly await it.
Many thanks in advance 👍.
No problem at all! i uploaded the videoswap.py and i will be posting a tutorial on how to use GFPGAN to enhance. will also post reverse2original.py that is a bit faster at inference im still working on it so it is still a work in progress.
So yesterday after reading this post I downloaded GFPGAN to try it out. All I can say is... Wow. The results are incredible. Thanks for the tip.
@razr112 how does GFPGAN handle multiple faces in a picture?
Also, what if the face is not looking at the camera, but for example to the left or the right?
@Fibonacci134 when you'll have tested your improved output quality code and if you would'nt mind sharing it, please let us kindly know 👍.
Yes GFGPAN is amazing!! quick tip, if you want to just enhance the face (its a lot quicker) you can add --bg_upsampler none and it'll only do the faces. you can also have it save to only "restored imgs" by altering the inference.py file to convert an entire video into sequential images :
ffmpeg -i (name of your video).mp4 -vf fps=25 out%d.png
convert back to video once images are enhanced :
ffmpeg -f image2 -framerate 25 -i %0d.png -vcodec libx264 -crf 22 video.mp4
model v 1.3 is really good but I prefer v 1
Where exactly do I place that code in the inference.py file?
@Fibonacci134
Where exactly do I place that code in the inference.py file?
@Fibonacci134
Hey that code isn't meant to be put in inference.py , its just normal ffmpeg command. To make things simple, all you have to do is install FFmpeg and add the FFmpeg program to the Windows path using Environment variables. If using linux, then no need to do that, it will work without any other effort. The steps are as follows: -Put video in a empty folder -right click inside folder and open in terminal
Now that you've mentioned it, i think i will add the script into the Gfgpan script to automate the process. Give me a couple of days lol
Ah okay. I already use FFmpeg commands to extract the frames. I just couldn't figure out how to implement it into the inference file.
Now that you've mentioned it, i think i will add the script into the Gfgpan script to automate the process. Give me a couple of days lol
Awesome! Looking forward to it. Implementing the script is way above my level of beginner coding knowledge lol.
@Fibonacci134
Ohh okay, awesome. Lol ots just that there are generally more windows users and most are usually not too familiar with ffmpeg. Its great that you use it, such a handy little tool. And no worries bro, we're all beginners in grand scheme of things lol im Guessing you too are a self teacher so you know there's never any structure, we just try and learn things as they come across 😂. Will hopefully get to work on some stuff this weekend.
I use ffmpeg and gfpgan to get better output and the difference is just legendary. But GFPGAN is a resource intensive and slow algorithm, which is the only con that I have found.
Implementing GFPGAN or GPEN into the main swap pipeline would be an amazing improvement of this repo. If this could be implemented in a similar way to dot, it would be easy to use as an option before swapping. --gpen_type 512 for instance.
Are there any plans to implement something like this to SimSwap?
@epicstar7 you can include gfpgan cleaning as a step only on the masked results, for minimal overhead cost, by adding just a few cells to the colab, at least for single face swaps in a video (I haven't tried for multi case but should be relatively straightforward to extend) - first get all the necessary packages:
# Clone GFPGAN and enter the GFPGAN folder
%cd /content
!rm -rf GFPGAN
!git clone https://github.com/TencentARC/GFPGAN.git
%cd GFPGAN
# Set up the environment
# Install basicsr - https://github.com/xinntao/BasicSR
# We use BasicSR for both training and inference
!BASICSR_JIT='True' BASICSR_EXT=True pip install basicsr
# Install facexlib - https://github.com/xinntao/facexlib
# We use face detection and face restoration helper in the facexlib package
!pip install facexlib
# Install other depencencies
!pip install -r requirements.txt
!python setup.py develop
!pip install realesrgan # used for enhancing the background (non-face) regions
# Download the pre-trained model
# !wget https://github.com/TencentARC/GFPGAN/releases/download/v0.2.0/GFPGANCleanv1-NoCE-C2.pth -P experiments/pretrained_models
# Now we use the V1.3 model for the demo
!wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth -P experiments/pretrained_models
then write a new version of videoswap to use - the version below also saves frames as lossless pngs rather than jpg which can improve the quality of results at the cost of requiring additional memory
%%writefile /content/SimSwap/util/videoswap_gfpgan.py
import os
import torch
from torchvision.transforms.functional import normalize
from torchvision.transforms import Resize, ToTensor, Normalize, Compose
from basicsr.utils import tensor2img
gfpgan_transform_upsample = Compose([
Resize([int(512), int(512)]),
# ToTensor(),
Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])
def gfpgan_downsampler(crop_size):
return Compose([
Resize([int(crop_size), int(crop_size)]),
# ToTensor(),
# Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
os.chdir('/content/GFPGAN')
from gfpgan.archs.gfpganv1_clean_arch import GFPGANv1Clean
arch = 'clean'
channel_multiplier = 2
model_name = 'GFPGANv1.3'
model_path = os.path.join('experiments/pretrained_models', model_name + '.pth')
if arch == 'clean':
gfpgan = GFPGANv1Clean(
out_size=512,
num_style_feat=512,
channel_multiplier=channel_multiplier,
decoder_load_path=None,
fix_decoder=False,
num_mlp=8,
input_is_latent=True,
different_w=True,
narrow=1,
sft_half=True)
loadnet = torch.load(model_path)
if 'params_ema' in loadnet:
keyname = 'params_ema'
else:
keyname = 'params'
gfpgan.load_state_dict(loadnet[keyname], strict=True)
gfpgan.eval()
gfpgan = gfpgan.to(device)
os.chdir('/content/SimSwap')
'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-24 19:19:52
Description:
'''
import os
import cv2
import glob
import torch
import shutil
import numpy as np
from tqdm import tqdm
from util.reverse2original import reverse2wholeimage
import moviepy.editor as mp
from moviepy.editor import AudioFileClip, VideoFileClip
from moviepy.video.io.ImageSequenceClip import ImageSequenceClip
import time
from util.add_watermark import watermark_image
from util.norm import SpecificNorm
from parsing_model.model import BiSeNet
def video_swap(video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo = False,use_mask =False):
video_forcheck = VideoFileClip(video_path)
if video_forcheck.audio is None:
no_audio = True
else:
no_audio = False
del video_forcheck
if not no_audio:
video_audio_clip = AudioFileClip(video_path)
video = cv2.VideoCapture(video_path)
logoclass = watermark_image('./simswaplogo/simswaplogo.png')
ret = True
frame_index = 0
frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
# video_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
# video_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = video.get(cv2.CAP_PROP_FPS)
if os.path.exists(temp_results_dir):
shutil.rmtree(temp_results_dir)
spNorm =SpecificNorm()
if use_mask:
n_classes = 19
net = BiSeNet(n_classes=n_classes)
net.cuda()
save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
net.load_state_dict(torch.load(save_pth))
net.eval()
else:
net =None
def _totensor(array):
tensor = torch.from_numpy(array)
img = tensor.transpose(0, 1).transpose(0, 2).contiguous()
return img.float().div(255)
gfpgan_transform_downsample = gfpgan_downsampler(crop_size)
def gfpgan_enhance(img_tensor):
# gfp_t = normalize(img_tensor, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=False)
# gfp_t = gfp_t.unsqueeze(0).to(device)
img_tensor = gfpgan_transform_upsample(img_tensor.unsqueeze(0))
output = gfpgan(img_tensor,return_rgb=True, weight=0.5)[0]
# restored_face = tensor2img(output.squeeze(0), rgb2bgr=False, min_max=(-1, 1))
down_img = tensor2img(gfpgan_transform_downsample(output).squeeze(0), rgb2bgr=True, min_max=(-1, 1))[:,:,[2,1,0]]
return _totensor(down_img)
# while ret:
for frame_index in tqdm(range(frame_count)):
ret, frame = video.read()
if ret:
detect_results = detect_model.get(frame,crop_size)
if detect_results is not None:
# print(frame_index)
if not os.path.exists(temp_results_dir):
os.mkdir(temp_results_dir)
frame_align_crop_list = detect_results[0]
frame_mat_list = detect_results[1]
swap_result_list = []
frame_align_crop_tenor_list = []
for frame_align_crop in frame_align_crop_list:
# BGR TO RGB
# frame_align_crop_RGB = frame_align_crop[...,::-1]
frame_align_crop_tenor = _totensor(cv2.cvtColor(frame_align_crop,cv2.COLOR_BGR2RGB))[None,...].cuda()
swap_result = swap_model(None, frame_align_crop_tenor, id_vetor, None, True)[0]
# print(swap_result.shape)
# input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
# img = cv2.resize(img, (512, 512))
# restore faces and background if necessary
# these steps I think are identical to before
# cropped_face_t = img2tensor(cropped_face / 255., bgr2rgb=True, float32=True)
# normalize(cropped_face_t, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True)
# cropped_face_t = cropped_face_t.unsqueeze(0).to(self.device)
# output = self.gfpgan(cropped_face_t, return_rgb=False, weight=weight)[0]
# # convert to image
# restored_face = tensor2img(output.squeeze(0), rgb2bgr=True, min_max=(-1, 1))
swap_result = gfpgan_enhance(swap_result)
# print(swap_result.shape)
cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)), frame)
swap_result_list.append(swap_result)
frame_align_crop_tenor_list.append(frame_align_crop_tenor)
reverse2wholeimage(frame_align_crop_tenor_list,swap_result_list, frame_mat_list, crop_size, frame, logoclass,\
os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)),no_simswaplogo,pasring_model =net,use_mask=use_mask, norm = spNorm)
else:
if not os.path.exists(temp_results_dir):
os.mkdir(temp_results_dir)
frame = frame.astype(np.uint8)
if not no_simswaplogo:
frame = logoclass.apply_frames(frame)
cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)), frame)
else:
break
video.release()
# image_filename_list = []
path = os.path.join(temp_results_dir,'*.png')
image_filenames = sorted(glob.glob(path))
clips = ImageSequenceClip(image_filenames,fps = fps)
if not no_audio:
clips = clips.set_audio(video_audio_clip)
clips.write_videofile(save_path,audio_codec='aac')
then finally run a slightly modified version of the original script, where this also removes the watermark placed by default, and lets you choose whether to use the 224 or 512 crop size version of the model (NB if choosing 512, you will also need to add a !wget https://github.com/neuralchen/SimSwap/releases/download/512_beta/512.zip !unzip ./512.zip -d ./checkpoints
line above the original checkpoint download)
%cd /content/SimSwap
import cv2
import torch
import fractions
import numpy as np
from PIL import Image
import torch.nn.functional as F
from torchvision import transforms
from models.models import create_model
from options.test_options import TestOptions
from insightface_func.face_detect_crop_single import Face_detect_crop
from util.videoswap_gfpgan import video_swap
import os
def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0
transformer = transforms.Compose([
transforms.ToTensor(),
#transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
transformer_Arcface = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
opt = TestOptions()
opt.initialize()
opt.parser.add_argument('-f') ## dummy arg to avoid bug
opt = opt.parse()
opt.pic_a_path = '/path/to/desired/face.png' ## or replace it with image from your own google drive
opt.video_path = '/path/to/target/video.mp4' ## or replace it with video from your own google drive
opt.output_path = '/path/to/output/video.mp4'
opt.temp_path = './tmp'
opt.Arc_path = './arcface_model/arcface_checkpoint.tar'
opt.isTrain = False
opt.use_mask = True ## new feature up-to-date
opt.no_simswaplogo = True
start_epoch, epoch_iter = 1, 0
crop_sizes = [224,512]
opt.crop_size = crop_sizes[0]
crop_size = opt.crop_size
torch.nn.Module.dump_patches = True
if crop_size == 512:
opt.which_epoch = 550000
opt.name = '512'
mode = 'ffhq'
else:
mode = 'None'
model = create_model(opt)
model.eval()
app = Face_detect_crop(name='antelope', root='./insightface_func/models')
app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640),mode=mode)
with torch.no_grad():
pic_a = opt.pic_a_path
# img_a = Image.open(pic_a).convert('RGB')
img_a_whole = cv2.imread(pic_a)
img_a_align_crop, _ = app.get(img_a_whole,crop_size)
img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB))
img_a = transformer_Arcface(img_a_align_crop_pil)
img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2])
# pic_b = opt.pic_b_path
# img_b_whole = cv2.imread(pic_b)
# img_b_align_crop, b_mat = app.get(img_b_whole,crop_size)
# img_b_align_crop_pil = Image.fromarray(cv2.cvtColor(img_b_align_crop,cv2.COLOR_BGR2RGB))
# img_b = transformer(img_b_align_crop_pil)
# img_att = img_b.view(-1, img_b.shape[0], img_b.shape[1], img_b.shape[2])
# convert numpy to tensor
img_id = img_id.cuda()
# img_att = img_att.cuda()
#create latent id
img_id_downsample = F.interpolate(img_id, size=(112,112))
latend_id = model.netArc(img_id_downsample)
latend_id = F.normalize(latend_id, p=2, dim=1)
video_swap(opt.video_path, latend_id, model, app, opt.output_path,temp_results_dir=opt.temp_path,\
no_simswaplogo=opt.no_simswaplogo,use_mask=opt.use_mask,crop_size=crop_size)
hope that helps!
@fitzgeraldja Oh my god man ! You're my hero, thank you
@fitzgeraldja This is really awesome. I can't get this version to work on my pc though, only https://github.com/mike9251/simswap-inference-pytorch any chance you would fix the code for that repo? :)
Its quite simple, go to videoswap.py and on the bottom where you see jpg , just replace it with png.
I did that then I got this error:
Traceback (most recent call last):
File "test_video_swapsingle.py", line 86, in
@ziko2222 ,have you solved it, I have received the same problem
In case someone overlooked it, you can preserve quality a lot better by, changing your temp_directory files to png which is lossless, instead of jpg which is designed for compression and definitely loses quality. The folder will quickly grow in size because of the non compression nature of png, but you can always delete the folder once you get your output video.