Open Aaronthecowboy opened 1 month ago
Live video seems to be below 10fps on my Mac m1 pro.
I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this python run.py --keep-fps --execution-provider coreml
but it does no help 😂
Live video seems to be below 10fps on my Mac m1 pro.
I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
I will try to quantize the model, is there any way to change the video to 540p to increase the fps
I have an nvidia gpu pc, if I use the Mac camera to capture the video and do the computation in the nvidia pc, is it possible?
Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this
python run.py --keep-fps --execution-provider coreml
but it does no help 😂
In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.
Live video seems to be below 10fps on my Mac m1 pro.
I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
I will try to quantize the model, is there any way to change the video to 540p to increase the fps
This could totally work. You got this, bro!
Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this
python run.py --keep-fps --execution-provider coreml
but it does no help 😂In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.
sorry, I saw the author
tag on your reply, I thought you are the author of this project 😂.
Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this
python run.py --keep-fps --execution-provider coreml
but it does no help 😂In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.
sorry, I saw the
author
tag on your reply, I thought you are the author of this project 😂.
No biggie, our chat might just grab the author's attention😊
Yeah , let’s make this issue hot so the author will give a glance to us haha
El El sáb, 10 ago 2024 a las 11:18, Aaronthecowboy @.***> escribió:
Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this python run.py --keep-fps --execution-provider coreml but it does no help 😂
In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.
sorry, I saw the author tag on your reply, I thought you are the author of this project 😂.
No biggie, our chat might just grab the author's attention😊
— Reply to this email directly, view it on GitHub https://github.com/hacksider/Deep-Live-Cam/issues/120#issuecomment-2280539583, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSS6ACGXBFO3NB4TA4BX33ZQXLFVAVCNFSM6AAAAABMJTRZO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBQGUZTSNJYGM . You are receiving this because you commented.Message ID: @.***>
Live video seems to be below 10fps on my Mac m1 pro.
I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.
less than 1 fps here🤣. We might need smaller model.
Maybe letting the gpu work will improve the frame rate😂
I will try to quantize the model, is there any way to change the video to 540p to increase the fps
I changed the resolution to 360p, but it's still slow
I second the terrible performance on Mac silicon. I tried reducing the resolution of OpenCV capture and fps but it doesn't influence the bad performance. I guess it's model related. If you want to experiment, check lines 279-282 of models/ui.py
I don't think resolution is the key, your performance majorly depends on the model size.
I've found an issue with similar symptoms. If anyone had uses onnxruntime
python API before, perhaps could tell us how to add options and set COREML_FLAG_USE_CPU_ONLY = 0
?
The performance is better (although still choppy) with the non fp16 model
The performance is better (although still choppy) with the non fp16 model
Looks great, What's the fps you got now?
@RoversX I don't know how to measure them :)
Apple M2 - 1 FPS :( on coreml
@varna9000 did you change COREML_FLAG_USE_CPU_ONLY = 0
this? :)
@essssence no, I downloaded the other model which is not fp16 from the same repo. Once downloaded in models/ directory, just rename it to inswapper_128_fp16.onnx
Alternatively, just edit this line
@varna9000 tried this on GTX 1060 6GB - a bit better, but still ± around 3 FPS :D
The performance is better (although still choppy) with the non fp16 model
You are right! A bit faster
The performance is better (although still choppy) with the non fp16 model
I have tried the non fp16 model on M3 Pro Max, it's much better than the orginal model.
non fp16 model made it go from 1fps to 15fps
The performance is better (although still choppy) with the non fp16 model
This method really worked—I got a noticeable boost in FPS (though it’s still around 10, kinda feels like the laggy days from 10 years ago).
The performance is better (although still choppy) with the non fp16 model
This method really worked—I got a noticeable boost in FPS (though it’s still around 10, kinda feels like the laggy days from 10 years ago).
It works on my M3 Max and thank you all :)
https://github.com/gongzhang/Deep-Live-Cam/blob/main/README.macos.md
非 fp16 模型的性能更好(尽管仍然不稳定)
There is indeed an improvement in the frame rate, but it still does not reach a smooth speed. For this model, why is version 32 faster than version 16? It may be the computing architecture of apple m1, which will call more system resources at 32 precision. To solve this problem, it also needs to be transferred to gpu to run.
I got the same problem, when not using with Face Enhancer
the FPS is about 15, but when using with Face Enhancer
the FPS is about 1.
Does anyone know how to solve this?
Convert the onnx model back to pytorch model and run in mps mode a bit faster in my mac M3 pro. It would actually use the GPU. One inference for the swapping cut from around 600ms to 150ms in my case. A bit better but still not smooth. For the onnxruntime, I don't think it support coreml GPU yet in python.
Convert the onnx model back to pytorch model and run in mps mode a bit faster in my mac M3 pro. It would actually use the GPU. One inference for the swapping cut from around 600ms to 150ms in my case. A bit better but still not smooth. For the onnxruntime, I don't think it support coreml GPU yet in python.
@jiafanwu could you share how you did it?
This repo leverage insightface to handle the face swap. Here is the code to interact with insightface to get the model. Under the hood it use inswapper to swap the face.
For local, a quick and dirty way just inject the model conversion in the __init__
.
from onnx2torch import convert
class INSwapper():
def __init__(self, model_file=None, session=None):
# other part of original code
device = torch.device("mps")
self.device = device
self.pt_model = convert(self.model_file)
self.pt_model.to(device)
self.pt_model.eval()
Then inside the get
you would want to swap out the inference code to something like this
img_tensor = torch.from_numpy(blob).to(torch.float32).to(self.device) # Convert numpy array to tensor
latent_tensor = torch.from_numpy(latent).to(torch.float32).to(self.device) # Convert numpy array to tensor
with torch.no_grad():
pred = self.pt_model(img_tensor, latent_tensor) # Forward pass with PyTorch model
pred2_np = pred.squeeze(0).unsqueeze(0).cpu().numpy() # Convert back to numpy array
Same thing apply for forward
.
And the package for conversion is onnx2torch
此 repo 利用 insightface 来处理人脸交换。以下是与 insightface 交互以获取模型的代码。在底层,它使用inswapper来交换人脸。
对于本地来说,一种快速而肮脏的方法就是在中注入模型转换
__init__
。from onnx2torch import convert class INSwapper(): def __init__(self, model_file=None, session=None): # other part of original code device = torch.device("mps") self.device = device self.pt_model = convert(self.model_file) self.pt_model.to(device) self.pt_model.eval()
然后,
get
你可能想将推理代码替换成类似这样的代码img_tensor = torch.from_numpy(blob).to(torch.float32).to(self.device) # Convert numpy array to tensor latent_tensor = torch.from_numpy(latent).to(torch.float32).to(self.device) # Convert numpy array to tensor with torch.no_grad(): pred = self.pt_model(img_tensor, latent_tensor) # Forward pass with PyTorch model pred2_np = pred.squeeze(0).unsqueeze(0).cpu().numpy() # Convert back to numpy array
同样适用于
forward
。 转换包是onnx2torch
I'm trying to modify this part of the source code, but I encountered some errors. The following is my code:
import time
import numpy as np
import torch
import torch.nn.functional as F
import cv2
from ..utils import face_align
import onnxruntime
import onnx
from onnx import numpy_helper
from onnx2torch import convert
class INSwapper():
def __init__(self, model_file=None, session=None):
self.model_file = model_file
self.session = session
model = onnx.load(self.model_file)
graph = model.graph
self.emap = numpy_helper.to_array(graph.initializer[-1])
self.input_mean = 0.0
self.input_std = 255.0
# print('input mean and std:', model_file, self.input_mean, self.input_std)
if self.session is None:
self.session = onnxruntime.InferenceSession(self.model_file, None)
inputs = self.session.get_inputs()
self.input_names = []
for inp in inputs:
self.input_names.append(inp.name)
outputs = self.session.get_outputs()
output_names = []
for out in outputs:
output_names.append(out.name)
self.output_names = output_names
assert len(self.output_names) == 1
output_shape = outputs[0].shape
input_cfg = inputs[0]
input_shape = input_cfg.shape
self.input_shape = input_shape
print('inswapper-shape:', self.input_shape)
self.input_size = tuple(input_shape[2:4][::-1])
self.device = torch.device("mps")
self.model = convert(model)
self.model.to(self.device)
self.model.eval()
print(f'INSwapper initialized on {self.device}')
def forward(self, img, latent):
img = (img - self.input_mean) / self.input_std
img_tensor = torch.from_numpy(img).to(torch.float32).to(self.device)
latent_tensor = torch.from_numpy(latent).to(torch.float32).to(self.device)
with torch.no_grad():
pred = self.model(img_tensor, latent_tensor) # Forward pass with PyTorch model
return pred.squeeze(0).unsqueeze(0).cpu().numpy() # Convert back to numpy array
def get(self, img, target_face, source_face, paste_back=True):
aimg, M = face_align.norm_crop2(img, target_face.kps, self.input_size[0])
blob = cv2.dnn.blobFromImage(aimg, 1.0 / self.input_std, self.input_size,
(self.input_mean, self.input_mean, self.input_mean), swapRB=True)
latent = source_face.normed_embedding.reshape((1, -1))
latent = np.dot(latent, self.emap)
latent /= np.linalg.norm(latent)
# Run the forward pass
pred = self.forward(blob, latent)
# Post-processing (similar to original code)
img_fake = pred.transpose((0, 2, 3, 1))[0]
bgr_fake = np.clip(255 * img_fake, 0, 255).astype(np.uint8)[:, :, ::-1]
if not paste_back:
return bgr_fake, M
else:
target_img = img
fake_diff = bgr_fake.astype(np.float32) - aimg.astype(np.float32)
fake_diff = np.abs(fake_diff).mean(axis=2)
fake_diff[:2, :] = 0
fake_diff[-2:, :] = 0
fake_diff[:, :2] = 0
fake_diff[:, -2:] = 0
IM = cv2.invertAffineTransform(M)
img_white = np.full((aimg.shape[0], aimg.shape[1]), 255, dtype=np.float32)
bgr_fake = cv2.warpAffine(bgr_fake, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
img_white = cv2.warpAffine(img_white, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
fake_diff = cv2.warpAffine(fake_diff, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
img_white[img_white > 20] = 255
fthresh = 10
fake_diff[fake_diff < fthresh] = 0
fake_diff[fake_diff >= fthresh] = 255
img_mask = img_white
mask_h_inds, mask_w_inds = np.where(img_mask == 255)
mask_h = np.max(mask_h_inds) - np.min(mask_h_inds)
mask_w = np.max(mask_w_inds) - np.min(mask_w_inds)
mask_size = int(np.sqrt(mask_h * mask_w))
k = max(mask_size // 10, 10)
kernel = np.ones((k, k), np.uint8)
img_mask = cv2.erode(img_mask, kernel, iterations=1)
kernel = np.ones((2, 2), np.uint8)
fake_diff = cv2.dilate(fake_diff, kernel, iterations=1)
k = max(mask_size // 20, 5)
kernel_size = (k, k)
blur_size = tuple(2 * i + 1 for i in kernel_size)
img_mask = cv2.GaussianBlur(img_mask, blur_size, 0)
k = 5
kernel_size = (k, k)
blur_size = tuple(2 * i + 1 for i in kernel_size)
fake_diff = cv2.GaussianBlur(fake_diff, blur_size, 0)
img_mask /= 255
fake_diff /= 255
img_mask = np.reshape(img_mask, [img_mask.shape[0], img_mask.shape[1], 1])
fake_merged = img_mask * bgr_fake + (1 - img_mask) * target_img.astype(np.float32)
fake_merged = fake_merged.astype(np.uint8)
return fake_merged
运行以后会得到下面的错误:
2024-09-06 09:59:13.660 python[10883:3211202] +[IMKClient subclass]: chose IMKClient_Legacy
2024-09-06 09:59:13.660 python[10883:3211202] +[IMKInputSession subclass]: chose IMKInputSession_Legacy
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
inswapper-shape: [1, 3, 128, 128]
INSwapper initialized on mps
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'device', 'emap', 'forward', 'get', 'input_mean', 'input_names', 'input_shape', 'input_size', 'input_std', 'model', 'model_file', 'output_names', 'session']
Backend TkAgg is interactive backend. Turning interactive mode on.
2024-09-06 09:59:39.634 python[10883:3213123] failed assertion _status < MTLCommandBufferStatusCommitted at line 322 in -[IOGPUMetalCommandBuffer setCurrentCommandEncoder:]
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)```
At present, I don't know how to solve it, and I'm still trying.
Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.