low fps on Mac m1 pro - Githubissues

Aaronthecowboy commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

RoversX commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro.

I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Aaronthecowboy commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

1057437122 commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this python run.py --keep-fps --execution-provider coreml but it does no help 😂

RoversX commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro.

I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

I will try to quantize the model, is there any way to change the video to 540p to increase the fps

RoversX commented 1 month ago

I have an nvidia gpu pc, if I use the Mac camera to capture the video and do the computation in the nvidia pc, is it possible?

Aaronthecowboy commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this python run.py --keep-fps --execution-provider coreml but it does no help 😂

In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.

Aaronthecowboy commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro.

I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

I will try to quantize the model, is there any way to change the video to 540p to increase the fps

This could totally work. You got this, bro!

1057437122 commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this python run.py --keep-fps --execution-provider coreml but it does no help 😂

In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.

sorry, I saw the author tag on your reply, I thought you are the author of this project 😂.

Aaronthecowboy commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this python run.py --keep-fps --execution-provider coreml but it does no help 😂

In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.

sorry, I saw the author tag on your reply, I thought you are the author of this project 😂.

No biggie, our chat might just grab the author's attention😊

1057437122 commented 1 month ago

Yeah , let’s make this issue hot so the author will give a glance to us haha

El El sáb, 10 ago 2024 a las 11:18, Aaronthecowboy @.***> escribió:

Live video seems to be below 10fps on my Mac m1 pro. I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

Thanks you for opensource this, but how to make GPU work? I'm using m3 pro, I just tried this, it's also very slow, like maybe 10 fps, I tried --keep-fps parameter like this python run.py --keep-fps --execution-provider coreml but it does no help 😂

In fact, I don’t know😂. Maybe GFPGANv1.4 and inswapper_128_fp16 need to support coreML.

sorry, I saw the author tag on your reply, I thought you are the author of this project 😂.

No biggie, our chat might just grab the author's attention😊

— Reply to this email directly, view it on GitHub https://github.com/hacksider/Deep-Live-Cam/issues/120#issuecomment-2280539583, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSS6ACGXBFO3NB4TA4BX33ZQXLFVAVCNFSM6AAAAABMJTRZO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBQGUZTSNJYGM . You are receiving this because you commented.Message ID: @.***>

RoversX commented 1 month ago

Live video seems to be below 10fps on my Mac m1 pro.

I installed onnxruntime-silicon according to the instructions and ran the program through python run.py --execution-provider coreml, but the GPU did not work.

less than 1 fps here🤣. We might need smaller model.

Maybe letting the gpu work will improve the frame rate😂

I will try to quantize the model, is there any way to change the video to 540p to increase the fps

I changed the resolution to 360p, but it's still slow

varna9000 commented 1 month ago

I second the terrible performance on Mac silicon. I tried reducing the resolution of OpenCV capture and fps but it doesn't influence the bad performance. I guess it's model related. If you want to experiment, check lines 279-282 of models/ui.py

FNsi commented 1 month ago

I don't think resolution is the key, your performance majorly depends on the model size.

varna9000 commented 1 month ago

I've found an issue with similar symptoms. If anyone had uses onnxruntime python API before, perhaps could tell us how to add options and set COREML_FLAG_USE_CPU_ONLY = 0?

varna9000 commented 1 month ago

The performance is better (although still choppy) with the non fp16 model

RoversX commented 1 month ago

The performance is better (although still choppy) with the non fp16 model

Looks great, What's the fps you got now?

varna9000 commented 1 month ago

@RoversX I don't know how to measure them :)

essssence commented 1 month ago

Apple M2 - 1 FPS :( on coreml

essssence commented 1 month ago

@varna9000 did you change COREML_FLAG_USE_CPU_ONLY = 0 this? :)

varna9000 commented 1 month ago

@essssence no, I downloaded the other model which is not fp16 from the same repo. Once downloaded in models/ directory, just rename it to inswapper_128_fp16.onnx

varna9000 commented 1 month ago

Alternatively, just edit this line

essssence commented 1 month ago

@varna9000 tried this on GTX 1060 6GB - a bit better, but still ± around 3 FPS :D

RoversX commented 1 month ago

The performance is better (although still choppy) with the non fp16 model

You are right! A bit faster

lhuanyu commented 1 month ago

The performance is better (although still choppy) with the non fp16 model

I have tried the non fp16 model on M3 Pro Max, it's much better than the orginal model.

frozencap commented 1 month ago

non fp16 model made it go from 1fps to 15fps

Aaronthecowboy commented 1 month ago

The performance is better (although still choppy) with the non fp16 model

This method really worked—I got a noticeable boost in FPS (though it’s still around 10, kinda feels like the laggy days from 10 years ago).

gongzhang commented 1 month ago

The performance is better (although still choppy) with the non fp16 model

This method really worked—I got a noticeable boost in FPS (though it’s still around 10, kinda feels like the laggy days from 10 years ago).

It works on my M3 Max and thank you all :)

https://github.com/gongzhang/Deep-Live-Cam/blob/main/README.macos.md

solstice-gao commented 4 weeks ago

非 fp16 模型的性能更好（尽管仍然不稳定）

There is indeed an improvement in the frame rate, but it still does not reach a smooth speed. For this model, why is version 32 faster than version 16? It may be the computing architecture of apple m1, which will call more system resources at 32 precision. To solve this problem, it also needs to be transferred to gpu to run.

109km commented 3 weeks ago

I got the same problem, when not using with Face Enhancer the FPS is about 15, but when using with Face Enhancer the FPS is about 1. Does anyone know how to solve this?

jiafanwu commented 3 weeks ago

Convert the onnx model back to pytorch model and run in mps mode a bit faster in my mac M3 pro. It would actually use the GPU. One inference for the swapping cut from around 600ms to 150ms in my case. A bit better but still not smooth. For the onnxruntime, I don't think it support coreml GPU yet in python.

neo2478 commented 1 week ago

Convert the onnx model back to pytorch model and run in mps mode a bit faster in my mac M3 pro. It would actually use the GPU. One inference for the swapping cut from around 600ms to 150ms in my case. A bit better but still not smooth. For the onnxruntime, I don't think it support coreml GPU yet in python.

@jiafanwu could you share how you did it?

jiafanwu commented 1 week ago

This repo leverage insightface to handle the face swap. Here is the code to interact with insightface to get the model. Under the hood it use inswapper to swap the face.

For local, a quick and dirty way just inject the model conversion in the __init__.

from onnx2torch import convert

class INSwapper():
    def __init__(self, model_file=None, session=None):

        # other part of original code
        device = torch.device("mps")
        self.device = device
        self.pt_model = convert(self.model_file)
        self.pt_model.to(device)
        self.pt_model.eval()

Then inside the get you would want to swap out the inference code to something like this

        img_tensor = torch.from_numpy(blob).to(torch.float32).to(self.device) # Convert numpy array to tensor
        latent_tensor = torch.from_numpy(latent).to(torch.float32).to(self.device)  # Convert numpy array to tensor

        with torch.no_grad():
            pred = self.pt_model(img_tensor, latent_tensor)  # Forward pass with PyTorch model
            pred2_np = pred.squeeze(0).unsqueeze(0).cpu().numpy() # Convert back to numpy array

Same thing apply for forward. And the package for conversion is onnx2torch

solstice-gao commented 1 week ago

此 repo 利用 insightface 来处理人脸交换。以下是与 insightface 交互以获取模型的代码。在底层，它使用inswapper来交换人脸。

对于本地来说，一种快速而肮脏的方法就是在中注入模型转换__init__。
from onnx2torch import convert

class INSwapper():
    def __init__(self, model_file=None, session=None):

        # other part of original code
        device = torch.device("mps")
        self.device = device
        self.pt_model = convert(self.model_file)
        self.pt_model.to(device)
        self.pt_model.eval()
然后，get你可能想将推理代码替换成类似这样的代码
        img_tensor = torch.from_numpy(blob).to(torch.float32).to(self.device) # Convert numpy array to tensor
        latent_tensor = torch.from_numpy(latent).to(torch.float32).to(self.device)  # Convert numpy array to tensor

        with torch.no_grad():
            pred = self.pt_model(img_tensor, latent_tensor)  # Forward pass with PyTorch model
            pred2_np = pred.squeeze(0).unsqueeze(0).cpu().numpy() # Convert back to numpy array
同样适用于forward。转换包是onnx2torch

I'm trying to modify this part of the source code, but I encountered some errors. The following is my code:

import time
import numpy as np
import torch
import torch.nn.functional as F
import cv2
from ..utils import face_align

import onnxruntime
import onnx
from onnx import numpy_helper

from onnx2torch import convert

class INSwapper():
    def __init__(self, model_file=None, session=None):
        self.model_file = model_file
        self.session = session
        model = onnx.load(self.model_file)
        graph = model.graph
        self.emap = numpy_helper.to_array(graph.initializer[-1])
        self.input_mean = 0.0
        self.input_std = 255.0
        # print('input mean and std:', model_file, self.input_mean, self.input_std)
        if self.session is None:
            self.session = onnxruntime.InferenceSession(self.model_file, None)
        inputs = self.session.get_inputs()
        self.input_names = []
        for inp in inputs:
            self.input_names.append(inp.name)
        outputs = self.session.get_outputs()
        output_names = []
        for out in outputs:
            output_names.append(out.name)
        self.output_names = output_names
        assert len(self.output_names) == 1
        output_shape = outputs[0].shape
        input_cfg = inputs[0]
        input_shape = input_cfg.shape
        self.input_shape = input_shape
        print('inswapper-shape:', self.input_shape)
        self.input_size = tuple(input_shape[2:4][::-1])

        self.device = torch.device("mps")
        self.model = convert(model)
        self.model.to(self.device)
        self.model.eval()
        print(f'INSwapper initialized on {self.device}')

    def forward(self, img, latent):
        img = (img - self.input_mean) / self.input_std

        img_tensor = torch.from_numpy(img).to(torch.float32).to(self.device)
        latent_tensor = torch.from_numpy(latent).to(torch.float32).to(self.device)

        with torch.no_grad():
            pred = self.model(img_tensor, latent_tensor) # Forward pass with PyTorch model
            return pred.squeeze(0).unsqueeze(0).cpu().numpy()  # Convert back to numpy array

    def get(self, img, target_face, source_face, paste_back=True):
        aimg, M = face_align.norm_crop2(img, target_face.kps, self.input_size[0])
        blob = cv2.dnn.blobFromImage(aimg, 1.0 / self.input_std, self.input_size,
                                     (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
        latent = source_face.normed_embedding.reshape((1, -1))
        latent = np.dot(latent, self.emap)
        latent /= np.linalg.norm(latent)

        # Run the forward pass
        pred = self.forward(blob, latent)

        # Post-processing (similar to original code)
        img_fake = pred.transpose((0, 2, 3, 1))[0]
        bgr_fake = np.clip(255 * img_fake, 0, 255).astype(np.uint8)[:, :, ::-1]

        if not paste_back:
            return bgr_fake, M
        else:
            target_img = img
            fake_diff = bgr_fake.astype(np.float32) - aimg.astype(np.float32)
            fake_diff = np.abs(fake_diff).mean(axis=2)
            fake_diff[:2, :] = 0
            fake_diff[-2:, :] = 0
            fake_diff[:, :2] = 0
            fake_diff[:, -2:] = 0
            IM = cv2.invertAffineTransform(M)
            img_white = np.full((aimg.shape[0], aimg.shape[1]), 255, dtype=np.float32)
            bgr_fake = cv2.warpAffine(bgr_fake, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
            img_white = cv2.warpAffine(img_white, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
            fake_diff = cv2.warpAffine(fake_diff, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
            img_white[img_white > 20] = 255
            fthresh = 10
            fake_diff[fake_diff < fthresh] = 0
            fake_diff[fake_diff >= fthresh] = 255
            img_mask = img_white
            mask_h_inds, mask_w_inds = np.where(img_mask == 255)
            mask_h = np.max(mask_h_inds) - np.min(mask_h_inds)
            mask_w = np.max(mask_w_inds) - np.min(mask_w_inds)
            mask_size = int(np.sqrt(mask_h * mask_w))
            k = max(mask_size // 10, 10)
            kernel = np.ones((k, k), np.uint8)
            img_mask = cv2.erode(img_mask, kernel, iterations=1)
            kernel = np.ones((2, 2), np.uint8)
            fake_diff = cv2.dilate(fake_diff, kernel, iterations=1)
            k = max(mask_size // 20, 5)
            kernel_size = (k, k)
            blur_size = tuple(2 * i + 1 for i in kernel_size)
            img_mask = cv2.GaussianBlur(img_mask, blur_size, 0)
            k = 5
            kernel_size = (k, k)
            blur_size = tuple(2 * i + 1 for i in kernel_size)
            fake_diff = cv2.GaussianBlur(fake_diff, blur_size, 0)
            img_mask /= 255
            fake_diff /= 255
            img_mask = np.reshape(img_mask, [img_mask.shape[0], img_mask.shape[1], 1])
            fake_merged = img_mask * bgr_fake + (1 - img_mask) * target_img.astype(np.float32)
            fake_merged = fake_merged.astype(np.uint8)
            return fake_merged

运行以后会得到下面的错误：


2024-09-06 09:59:13.660 python[10883:3211202] +[IMKClient subclass]: chose IMKClient_Legacy
2024-09-06 09:59:13.660 python[10883:3211202] +[IMKInputSession subclass]: chose IMKInputSession_Legacy
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/ga666666/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
inswapper-shape: [1, 3, 128, 128]
INSwapper initialized on mps
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'device', 'emap', 'forward', 'get', 'input_mean', 'input_names', 'input_shape', 'input_size', 'input_std', 'model', 'model_file', 'output_names', 'session']
Backend TkAgg is interactive backend. Turning interactive mode on.
2024-09-06 09:59:39.634 python[10883:3213123] failed assertion _status < MTLCommandBufferStatusCommitted at line 322 in -[IOGPUMetalCommandBuffer setCurrentCommandEncoder:]

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)```

At present, I don't know how to solve it, and I'm still trying.

hacksider / Deep-Live-Cam

low fps on Mac m1 pro #120