intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.75k stars 1.27k forks source link

Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? #10942

Open shailesh837 opened 6 months ago

shailesh837 commented 6 months ago

I am using from older post from : issues of ipex-llm for TTS [text to speech] I have 2 issues: a) it takes 8 seconds for TTS on XPU than on CPU [3 sec] why ? b) When i run the code below , every time , it converts to int4, why can't it does once and save it local, i tried as well, it fails : 2024-05-06 23:56:05,898 - INFO - intel_extension_for_pytorch auto imported 2024-05-06 23:56:05,898 - INFO - Converting the current model to sym_int4 format.....

conda create -n speecht5-test python=3.9 conda activate speecht5-test

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu pip install datasets soundfile Runtime Configuration: following here

Code:

import torch
from transformers import SpeechT5Processor, SpeechT5HifiGan, SpeechT5ForTextToSpeech
from datasets import load_dataset
import soundfile as sf
import time

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

from bigdl.llm import optimize_model
model = optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                      "speech_decoder_postnet.prob_out"]) 
model = model.to('xpu')
vocoder = vocoder.to('xpu')

text = "Alright, listen up. Tyres are still a bit cold, but they're getting there. Keep the pace steady and focus on getting them up to temp. We need those pressures closer to 30 psi, so keep an eye on that. Once the tyres are ready, we'll be good to go. Now get out there and give it everything you've got."
inputs = processor(text=text, return_tensors="pt").to('xpu')

# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors",
                                  split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0).to('xpu')

with torch.inference_mode():
  # wamrup
  st = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  print(f'Warmup time: {time.perf_counter() - st}')

  st1 = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  torch.xpu.synchronize()
  st2 = time.perf_counter()
  print(f"Inference time: {st2-st1}")

sf.write("speech_bigdl_llm.wav", speech.to('cpu').numpy(), samplerate=16000)

Saving Local folder, But it didn't worked ., got error:

# Check if the optimized model is already saved on disk
optimized_model_path= "speecht5_tts_optimized"
if os.path.exists(optimized_model_path):
    # Load the optimized model directly
    model= SpeechT5ForTextToSpeech.from_pretrained(optimized_model_path, ignore_mismatched_sizes=True)
else:
    # Load and optimize the model, then save it
    model= SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
    #from bigdl.llm import optimize_model
    model= optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                          "speech_decoder_postnet.prob_out"])
    # Save the optimized model to disk
    model.save_pretrained(optimized_model_path)

Error while using saving code:

warn(
2024-05-07 00:11:10,686 - INFO - intel_extension_for_pytorch auto imported
Traceback (most recent call last):
File "/home/spandey2/tts/speechT5.py", line 20, in <module>
model= SpeechT5ForTextToSpeech.from_pretrained(optimized_model_path)
File "/home/spandey2/miniconda3/envs/speecht5-test/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/spandey2/miniconda3/envs/speecht5-test/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3310, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for SpeechT5ForTextToSpeech:
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.k_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.v_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.q_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.out_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.feed_forward.intermediate_dense.weight: copying a param with shape torch.Size([1253376]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
sgwhat commented 6 months ago

Hi @shailesh837, we are working on reproducing your issue.

sgwhat commented 6 months ago
  1. For your first question, please set environment variables for optimal performance as below (before running your program):

    export USE_XETLA=OFF
    export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
    export SYCL_CACHE_PERSISTENT=1
  2. For your second issue, please modify your code to save/load optimized model as below:

    # Save the model
    # Load and optimize the model, then save it
    model= SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
    from ipex_llm import optimize_model
    model= optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                     "speech_decoder_postnet.prob_out"])
    # Save the optimized model to disk
    optimized_model_path= "speecht5_tts_optimized"
    model.save_low_bit(optimized_model_path)

    Then load the optimized model as below:

    from ipex_llm.optimize import load_low_bit
    model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")  
    model= load_low_bit(model, optimized_model_path)

Note: We have already migrated bigdl-llm into ipex-llm, so please use ipex-llm instead, you may see ipex-llm installation guide for more details.