intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.51k stars 1.24k forks source link

bark model on intel gpu takes 60 seconds #11698

Open SlyRebula opened 1 month ago

SlyRebula commented 1 month ago

hello i am attempting to create text to speech with bark on intel a770 but it takes around 60 seconds to generate audio is that normal ? is there a way to make it faster like few seconds ? https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/bark

(phytia2) C:\phytia\Phytia>python ./synthesize_speech.py --text "IPEX-LLM is a library for running large language model on Intel XPU with very low latency." C:\Users\SlyRebula\miniconda3\envs\phytia2\Lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\SlyRebula\miniconda3\envs\Phytia2\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality fromtorchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpegorlibpnginstalled before buildingtorchvisionfrom source? warn( 2024-07-31 13:47:18,476 - INFO - intel_extension_for_pytorch auto imported C:\Users\SlyRebula\miniconda3\envs\phytia2\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning:resume_downloadis deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True. warnings.warn( C:\Users\SlyRebula\miniconda3\envs\phytia2\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-07-31 13:47:22,731 - INFO - Converting the current model to sym_int4 format...... The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_maskto obtain reliable results. Settingpad_token_idtoeos_token_id:10000 for open-end generation. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_maskto obtain reliable results. Settingpad_token_idtoeos_token_id:10000 for open-end generation. Inference time: 54.660537242889404 s

lzivan commented 1 month ago

Hi, we are trying to reproduce your issue.

lzivan commented 1 month ago

Hi @SlyRebula , we tried several times but couldn't reproduce your problem. We got our inference times around 11s.

(bark) arda@arda-arc01:~/zijie/bark$ python ./synthesize_speech.py --repo-id-or-model-path /mnt/disk1/models/bark-small --text 'IPEX-LLM is a library for running large language model on Intel XPU with very low latency.'
/home/arda/miniforge3/envs/bark/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-08-01 14:28:44,913 - INFO - intel_extension_for_pytorch auto imported
/home/arda/miniforge3/envs/bark/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
2024-08-01 14:28:45,742 - INFO - Converting the current model to sym_int4 format......
/home/arda/miniforge3/envs/bark/lib/python3.11/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.
Inference time: 11.526452779769897 s

You may check your environment or make sure there's no other process running. Feel free to reach out if you still have problems.