intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.23k stars 1.22k forks source link

Unable to run Speech T5 on XPU #10025

Open nedo99 opened 5 months ago

nedo99 commented 5 months ago

Hello,

I am trying to run Speech T5 on XPU but am unable to. It is this model https://huggingface.co/microsoft/speecht5_tts and here is my code:

from bigdl.llm.transformers import AutoModelForSpeechSeq2Seq, AutoModelForCausalLM
import intel_extension_for_pytorch as ipex
from bigdl.llm import optimize_model

model = AutoModelForCausalLM.from_pretrained("microsoft/speecht5_tts",
                                                    torch_dtype="auto",
                                                    trust_remote_code=True,
                                                    low_cpu_mem_usage=True
                                                    )
model = optimize_model(model)
model = model.to('xpu')

and I am getting the following error:

raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.speecht5.configuration_speecht5.SpeechT5Config'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, FalconConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MusicgenConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

Is there support for text-to-speech by BigDL? Or am I missing something?

Regards, Nedim

shane-huang commented 5 months ago

about the ValueError: Unrecognized configuration class

This error is not quite relevant to bigdl's support. The error message indicates the AutoModelForCausalLM does not support loading the speech t5 model. It could be that you're using the incorrect AutoClass, or the transformers version is not updated, or transformers does not support using AutoClasses to load this model.

about bigdl-llm usage

Regarding to bigdl-llm usage, 1) auto classes in bigdl.llm.transformers and 2) optimize_model are two sets of APIs and you can choose either of them (not both) to optimize a model.

  1. use auto classes from bigdl.llm.transformers if the model supports auto class loading (take AutoModelForCasualLM as example):

    from bigdl.llm.transformers import AutoModelForCausalLM
    # specifcy load_in_4bit=True to load the model directly in 4bit
    bigdl_model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)
  2. use optimize_model for arbitrary pytorch model loading:

    # normal way of loading a model using a model loader class XXXModel, e.g. `SpeechT5Model`, 
    from transformers import XXXModel
    original_model = XXXModel.from_pretrained(...) 
    
    from bigdl.llm import optimize_model
    bigdl_model = optimize_model(original_model) 

Speech T5 using optimize_model

According to transformers speecht5 doc, you can use SpeechT5Model to load the model, you may try loading the model using optimize_model like following.

from transformers import SpeechT5Model
...
model = SpeechT5Model.from_pretrained("microsoft/speecht5_tts")

from bigdl.llm import optimize_model
bigdl_model = optimize_model(model) 
bigdl_model.to('xpu')
shane-huang commented 5 months ago

SpeechT5 model can be successfully loaded using bigdl using AutoModelForSpeechSeq2Seq, instead of AutoModelForCasualLM. Below code works in our test (using transformers version 4.31.0 and bigdl version 2.5.0b20240124).

from bigdl.llm.transformers import AutoModelForSpeechSeq2Seq
...
model = AutoModelForSpeechSeq2Seq.from_pretrained("microsoft/speecht5_tts", load_in_4bit=True)
model = model.to("xpu")
...
nedo99 commented 5 months ago

SpeechT5 model can be successfully loaded using bigdl using AutoModelForSpeechSeq2Seq, instead of AutoModelForCasualLM. Below code works in our test (using transformers version 4.31.0 and bigdl version 2.5.0b20240124).

use optimize_model

from bigdl.llm.transformers import AutoModelForSpeechSeq2Seq
...
model = AutoModelForSpeechSeq2Seq.from_pretrained("microsoft/speecht5_tts", load_in_4bit=True)
model = model.to("xpu")
...

I updated my bigdl version, but now I am getting segfault. Here is backtrace:

in xpu::dpcpp::initGlobalDevicePoolState() () from /envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so
(gdb) bt
#0  0x00007fff17d3a9cb in xpu::dpcpp::initGlobalDevicePoolState() ()
   from envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so
#1  0x00007ffff7c99ee8 in __pthread_once_slow (once_control=0x7fff2a0888e0 <xpu::dpcpp::init_device_flag>, 
    init_routine=0x7fffc24dac90 <std::__once_proxy()>) at ./nptl/pthread_once.c:116
#2  0x00007fff17d37291 in xpu::dpcpp::dpcppGetDeviceCount(int*) ()
   from envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so
#3  0x00007fff17cf0912 in xpu::dpcpp::device_count()::{lambda()#1}::operator()() const ()
   from envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so
#4  0x00007fff17cf08d8 in xpu::dpcpp::device_count() ()

According to the backtrace, it seems like issue with finding the GPU. sycl-ls shows a discrete GPU. I am using oneAPI 2023.2 and kernel 5.19-0.41 and is working for other models. Could the issue be oneAPI version?

jason-dai commented 5 months ago

According to the backtrace, it seems like issue with finding the GPU. sycl-ls shows a discrete GPU. I am using oneAPI 2023.2 and kernel 5.19-0.41 and is working for other models. Could the issue be oneAPI version?

Yes, the default BigDL-LLM has upgraded to PyTorch 2.1/oneAPI 2024.0, and you will need to upgrade your oneAPI.

Alternatively, you may continue to install PyTorch 2.0 version of BigDL-LLM, which is compatible with oneAPI 2023.2 (see https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux).

nedo99 commented 5 months ago

According to the backtrace, it seems like issue with finding the GPU. sycl-ls shows a discrete GPU. I am using oneAPI 2023.2 and kernel 5.19-0.41 and is working for other models. Could the issue be oneAPI version?

Yes, the default BigDL-LLM has upgraded to PyTorch 2.1/oneAPI 2024.0, and you will need to upgrade your oneAPI.

Alternatively, you may continue to install PyTorch 2.0 version of BigDL-LLM, which is compatible with oneAPI 2023.2 (see https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux).

But with oneAPI 2023.2 this does not work and it segfaults as mentioned in the previous comment. With oneAPI 2024 I still did not get anything working since I am getting an error message: ImportError: libsycl.so.6: cannot open shared object file: No such file or directory. I tried both oneAPI v 2024.0.0.49564 and 2024.0.2-49895 with Pytorch 2.1.

shane-huang commented 5 months ago

According to the backtrace, it seems like issue with finding the GPU. sycl-ls shows a discrete GPU. I am using oneAPI 2023.2 and kernel 5.19-0.41 and is working for other models. Could the issue be oneAPI version?

Yes, the default BigDL-LLM has upgraded to PyTorch 2.1/oneAPI 2024.0, and you will need to upgrade your oneAPI. Alternatively, you may continue to install PyTorch 2.0 version of BigDL-LLM, which is compatible with oneAPI 2023.2 (see https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux).

But with oneAPI 2023.2 this does not work and it segfaults as mentioned in the previous comment. With oneAPI 2024 I still did not get anything working since I am getting an error message: ImportError: libsycl.so.6: cannot open shared object file: No such file or directory. I tried both oneAPI v 2024.0.0.49564 and 2024.0.2-49895 with Pytorch 2.1.

Did you correctly configure the OneAPI env variables (refer to the instructions here)? And also pay attention to the runtime configurations instructions here which may prevent lots of runtime issues.

nedo99 commented 5 months ago

According to the backtrace, it seems like issue with finding the GPU. sycl-ls shows a discrete GPU. I am using oneAPI 2023.2 and kernel 5.19-0.41 and is working for other models. Could the issue be oneAPI version?

Yes, the default BigDL-LLM has upgraded to PyTorch 2.1/oneAPI 2024.0, and you will need to upgrade your oneAPI. Alternatively, you may continue to install PyTorch 2.0 version of BigDL-LLM, which is compatible with oneAPI 2023.2 (see https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux).

But with oneAPI 2023.2 this does not work and it segfaults as mentioned in the previous comment. With oneAPI 2024 I still did not get anything working since I am getting an error message: ImportError: libsycl.so.6: cannot open shared object file: No such file or directory. I tried both oneAPI v 2024.0.0.49564 and 2024.0.2-49895 with Pytorch 2.1.

Did you correctly configure the OneAPI env variables (refer to the instructions here)? And also pay attention to the runtime configurations instructions here which may prevent lots of runtime issues.

Yes and yes, but the issue is still there.

shane-huang commented 5 months ago

Could you provide the os, kernel and python version?

Oscilloscope98 commented 5 months ago

With oneAPI 2024 I still did not get anything working since I am getting an error message: ImportError: libsycl.so.6: cannot open shared object file: No such file or directory. I tried both oneAPI v 2024.0.0.49564 and 2024.0.2-49895 with Pytorch 2.1.

To resolve this problem and use oneAPI 2024.0, it is recommended creating a new conda env through:

conda create -n new-llm-env python=3.9
conda activate new-llm-env

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

Or if you would like to use BigDL-LLM with oneAPI 2024.0 in your old conda environment, you could:

pip uninstall bigdl-core-xe
pip uninstall bigdl-core-xe-21
pip uninstall bigdl-core-xe-esimd
pip uninstall bigdl-core-xe-esimd-21
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

Note that bigdl-llm, bigdl-core-xe-21 and bigdl-core-xe-esimd-21 should have the same version if bigdl-llm has been upgraded to the one with oneAPI 2024.0/PyTorch 2.1 correctly.

nedo99 commented 5 months ago

Could you provide the os, kernel and python version?

OS: Ubuntu 22.04 kernel: 5.19.0-41-generic Python 3.9.18 oneAPI: 2024.0.0.49564 bigdl-llm: 2.5.0b20240201 bigdl_core_xe: 2.5.0b20240201 bigdl_core_xe_esimd: 2.5.0b20240201 intel_extension_for_pytorch: 2.1.10+xpu

Oscilloscope98 commented 4 months ago

Hi @nedo99,

For bigdl-llm>=2.5.0b20240204, you could run speech t5 with BigDL-LLM optimization as below :)

Env (PyTorch 2.1 with oneAPI 2024.0):

conda create -n speecht5-test python=3.9
conda activate speecht5-test

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install datasets soundfile 

Runtime Configuration: following here

Code:

import torch
from transformers import SpeechT5Processor, SpeechT5HifiGan, SpeechT5ForTextToSpeech
from datasets import load_dataset
import soundfile as sf
import time

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

from bigdl.llm import optimize_model
model = optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                      "speech_decoder_postnet.prob_out"]) 
model = model.to('xpu')
vocoder = vocoder.to('xpu')

text = "On a cold winter night, a lonely traveler found a shimmering stone in the snow, unaware that it would lead him to a world full of wonders."
inputs = processor(text=text, return_tensors="pt").to('xpu')

# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors",
                                  split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0).to('xpu')

with torch.inference_mode():
  # wamrup
  st = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  print(f'Warmup time: {time.perf_counter() - st}')

  st1 = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  torch.xpu.synchronize()
  st2 = time.perf_counter()
  print(f"Inference time: {st2-st1}")

sf.write("speech_bigdl_llm.wav", speech.to('cpu').numpy(), samplerate=16000)

Please let us know for any further problems :)

Oscilloscope98 commented 4 months ago

If you would be also interested in other TTS models we support, you can run Bark with BigDL-LLM optimization as follows :)

Env (PyTorch 2.1 with oneAPI 2024.0):

conda create -n bark-test python=3.9
conda activate bark-test

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install scipy

Runtime Configuration: following here

Code:

from transformers import AutoProcessor, BarkModel
import torch
import time

processor = AutoProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small")

from bigdl.llm import optimize_model
model = optimize_model(model).to('xpu')

voice_preset = "v2/en_speaker_6"

text = "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
inputs = processor(text, voice_preset=voice_preset).to('xpu')

# warmup
st = time.time()
with torch.inference_mode():
    model.generate(**inputs)
torch.xpu.synchronize()
print(f"Warmup time: {time.time() - st}")

st = time.time()
with torch.inference_mode():
  audio_array = model.generate(**inputs)
torch.xpu.synchronize()
print(f"Inference time: {time.time() - st}")

audio_array = audio_array.cpu().numpy().squeeze()

from scipy.io.wavfile import write as write_wav
sample_rate = model.generation_config.sample_rate
write_wav("output/bark_generation_bigdl_llm.wav", sample_rate, audio_array)
nedo99 commented 4 months ago

Speech T5 sample works.

Bark does not work. It segfaults and has the same backtrace as posted in one of the previous comments.

Oscilloscope98 commented 4 months ago

Speech T5 sample works.

Bark does not work. It segfaults and has the same backtrace as posted in one of the previous comments.

Hi @nedo99 ,

Could you let me know your test env for Bark?

Could you provide the os, kernel and python version?

OS: Ubuntu 22.04 kernel: 5.19.0-41-generic Python 3.9.18 oneAPI: 2024.0.0.49564 bigdl-llm: 2.5.0b20240201 bigdl_core_xe: 2.5.0b20240201 bigdl_core_xe_esimd: 2.5.0b20240201 intel_extension_for_pytorch: 2.1.10+xpu

What shows here seems not be a correct PyTorch 2.1 env for me :) You could try the steps here for a correct PyTorch 2.1 + oneAPI 2024.0 env for bigdl-llm: https://github.com/intel-analytics/BigDL/issues/10025#issuecomment-1920570410

nedo99 commented 4 months ago

Speech T5 sample works. Bark does not work. It segfaults and has the same backtrace as posted in one of the previous comments.

Hi @nedo99 ,

Could you let me know your test env for Bark?

Could you provide the os, kernel and python version?

OS: Ubuntu 22.04 kernel: 5.19.0-41-generic Python 3.9.18 oneAPI: 2024.0.0.49564 bigdl-llm: 2.5.0b20240201 bigdl_core_xe: 2.5.0b20240201 bigdl_core_xe_esimd: 2.5.0b20240201 intel_extension_for_pytorch: 2.1.10+xpu

What shows here seems not be a correct PyTorch 2.1 env for me :) You could try the steps here for a correct PyTorch 2.1 + oneAPI 2024.0 env for bigdl-llm: #10025 (comment)

Here is the updated environment:

pip list | grep bigdl
bigdl-core-xe-21            2.5.0b20240206
bigdl-core-xe-esimd-21      2.5.0b20240206
bigdl-llm                   2.5.0b20240206
Name: intel-extension-for-pytorch
Version: 2.1.10+xpu
oneAPI 2024