Open zachysaur opened 1 month ago
It looks like an incompatibility between your xformer and the torch version is causing the problem. You can try not using xformer by commenting out a few lines of inference.py
.
(venv) D:\talkingface\V-Express>python inference.py --reference_image_path "./test_samples/short_case/tys/ref.jpg" --audio_path "./test_samples/short_case/tys/aud.mp3" --output_path "./output/short_case/talk_tys_fix_face.mp4" --retarget_strategy "fix_face" --num_inference_steps 25
D:\talkingface\V-Express\venv\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available.
warnings.warn("No audio backend is available.")
Traceback (most recent call last):
File "D:\talkingface\V-Express\inference.py", line 275, in
(venv) D:\talkingface\V-Express>
I see in your previous log there is a line (you have 2.0.1+cpu)
. I think you don't have the gpu version of torch installed. you can try to set the device to cpu
, as follows.
python inference.py \
--reference_image_path "./test_samples/short_case/AOC/ref.jpg" \
--audio_path "./test_samples/short_case/AOC/chattts.mp3" \
--output_path "./output/short_case/talk_AOC_chattts_fix_face.mp4" \
--retarget_strategy "fix_face" \
--num_inference_steps 25 \
--device "cpu"
i followed your instructions about pip install packages for cuda how i can install?
(venv) D:\talkingface\V-Express>python inference.py --reference_image_path "./test_samples/short_case/tys/ref.jpg" --audio_path "./test_samples/short_case/tys/aud.mp3" --output_path "./output/short_case/talk_tys_fix_face.mp4" --retarget_strategy "fix_face" --num_inference_steps 25 --device "cpu" Some weights of the model checkpoint at ./model_ckpts/wav2vec2-base-960h/ were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
from_config
.If you were trying to load a model, please use <class 'modules.unet_2d_condition.UNet2DConditionModel'>.load_config(...) followed by <class 'modules.unet_2d_condition.UNet2DConditionModel'>.from_config(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Loaded weights of Reference Net from ./model_ckpts/v-express/reference_net.pth.
Loaded weights of Denoising U-Net from ./model_ckpts/v-express/denoising_unet.pth.
Loaded weights of Denoising U-Net Motion Module from ./model_ckpts/v-express/motion_module.pth.
Loaded weights of V-Kps Guider from ./model_ckpts/v-express/v_kps_guider.pth.
Loaded weights of Audio Projection from ./model_ckpts/v-express/audio_projection.pth.
Pipelines loaded with dtype=torch.float16
cannot run with cpu
device. It is not recommended to move them to cpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16
argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16
cannot run with cpu
device. It is not recommended to move them to cpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16
argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16
cannot run with cpu
device. It is not recommended to move them to cpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16
argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16
cannot run with cpu
device. It is not recommended to move them to cpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16
argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16
cannot run with cpu
device. It is not recommended to move them to cpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16
argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16
cannot run with cpu
device. It is not recommended to move them to cpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16
argument, or use another device for inference.
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (512, 512)
D:\talkingface\V-Express\venv\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: rcond
parameter will change to the default of machine precision times max(M, N)
where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None
, to keep using the old, explicitly pass rcond=-1
.
P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Length of audio is 64512 with the sampling rate of 16000.
The corresponding video length is 120.
D:\talkingface\V-Express\inference.py:216: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:248.)
kps_sequence = torch.tensor(torch.load(args.kps_path)) # [len, 3, 2]
The original length of kps sequence is 137.
The interpolated length of kps sequence is 120.
D:\talkingface\V-Express\pipelines\v_express_pipeline.py:516: FutureWarning: Accessing config attribute in_channels
directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
num_channels_latents = self.denoising_unet.in_channels
Traceback (most recent call last):
File "D:\talkingface\V-Express\inference.py", line 275, in (venv) D:\talkingface\V-Express>
--dtype fp32
. Mind you, the CPU will be running very, very slow.atleast it will work then i will know only thing i need is pytorch not torch
what is gpu version of torch?
(venv) D:\Talking Pictures\V-Express>python inference.py --reference_image_path "./test_samples/short_case/10/ref.jpg" --audio_path "./test_samples/short_case/10/aud.mp3" --output_path "./output/short_case/talk_AOC_chattts_fix_face.mp4" --retarget_strategy "fix_face" --num_inference_steps 25 --device "cpu" --dtype fp32 D:\Talking Pictures\V-Express\venv\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available. warnings.warn("No audio backend is available.") WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1+cpu) Python 3.10.11 (you have 3.10.11) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details Some weights of the model checkpoint at ./model_ckpts/wav2vec2-base-960h/ were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
from_config
.If you were trying to load a model, please use <class 'modules.unet_2d_condition.UNet2DConditionModel'>.load_config(...) followed by <class 'modules.unet_2d_condition.UNet2DConditionModel'>.from_config(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Loaded weights of Reference Net from ./model_ckpts/v-express/reference_net.pth.
Loaded weights of Denoising U-Net from ./model_ckpts/v-express/denoising_unet.pth.
Loaded weights of Denoising U-Net Motion Module from ./model_ckpts/v-express/motion_module.pth.
Loaded weights of V-Kps Guider from ./model_ckpts/v-express/v_kps_guider.pth.
Loaded weights of Audio Projection from ./model_ckpts/v-express/audio_projection.pth.
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
EP Error 'providers' and 'provider_options' should be the same length if both are given. when using ['CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./model_ckpts/insightface_models/models\buffalo_l\w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (512, 512)
D:\Talking Pictures\V-Express\venv\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: rcond
parameter will change to the default of machine precision times max(M, N)
where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None
, to keep using the old, explicitly pass rcond=-1
.
P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Length of audio is 70656 with the sampling rate of 16000.
The corresponding video length is 132.
D:\Talking Pictures\V-Express\inference.py:216: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:248.)
kps_sequence = torch.tensor(torch.load(args.kps_path)) # [len, 3, 2]
The original length of kps sequence is 137.
The interpolated length of kps sequence is 132.
D:\Talking Pictures\V-Express\pipelines\v_express_pipeline.py:516: FutureWarning: Accessing config attribute in_channels
directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
num_channels_latents = self.denoising_unet.in_channels
0%| | 0/25 [00:11<?, ?it/s]
Traceback (most recent call last):
File "D:\Talking Pictures\V-Express\inference.py", line 275, in (venv) D:\Talking Pictures\V-Express>
i made a full tutorial if you still couldn't make
works with python 3.10, cuda 11.8, venv
what is gpu version of torch?
You can find information about it at here.
@zachysaur
Here is completely free tutorial for Windows https://youtu.be/OFt6a2rR8GY
Let me know if you are still facing the issues
(venv) D:\talkingface\V-Express>python inference.py --reference_image_path "./test_samples/short_case/10/ref.jpg" --audio_path "./test_samples/short_case/10/aud.mp3" --kps_path "./test_samples/short_case/10/kps.pth" --output_path "./output/short_case/talk_10_no_retarget.mp4" --retarget_strategy "no_retarget" --num_inference_steps 25 D:\talkingface\V-Express\venv\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available. warnings.warn("No audio backend is available.") WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.0.1+cpu) Python 3.10.11 (you have 3.10.9) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details Traceback (most recent call last): File "D:\talkingface\V-Express\venv\lib\site-packages\diffusers\utils\import_utils.py", line 710, in _get_module return importlib.import_module("." + module_name, self.name) File "D:\talkingface\V-Express\python\lib\importlib__init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\talkingface\V-Express\venv\lib\site-packages\diffusers\models\autoencoder_kl.py", line 22, in
from .attention_processor import (
File "D:\talkingface\V-Express\venv\lib\site-packages\diffusers\models\attention_processor.py", line 31, in
import xformers
File "D:\talkingface\V-Express\venv\lib\site-packages\xformers\ init__.py", line 12, in
from .checkpoint import ( # noqa: E402, F401
File "D:\talkingface\V-Express\venv\lib\site-packages\xformers\checkpoint.py", line 464, in
class SelectiveCheckpointWrapper(ActivationWrapper):
File "D:\talkingface\V-Express\venv\lib\site-packages\xformers\checkpoint.py", line 481, in SelectiveCheckpointWrapper
@torch.compiler.disable
AttributeError: module 'torch' has no attribute 'compiler'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:\talkingface\V-Express\inference.py", line 10, in
from diffusers import AutoencoderKL, DDIMScheduler
File "", line 1075, in _handle_fromlist
File "D:\talkingface\V-Express\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in getattr
value = getattr(module, name)
File "D:\talkingface\V-Express\venv\lib\site-packages\diffusers\utils\import_utils.py", line 700, in getattr
module = self._get_module(self._class_to_module[name])
File "D:\talkingface\V-Express\venv\lib\site-packages\diffusers\utils\import_utils.py", line 712, in _get_module
raise RuntimeError(
RuntimeError: Failed to import diffusers.models.autoencoder_kl because of the following error (look up to see its traceback):
module 'torch' has no attribute 'compiler'