myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
4.84k stars 631 forks source link

Error when running install script example: `Placeholder storage has not been allocated on MPS device!` #202

Open jaanli opened 1 month ago

jaanli commented 1 month ago

After running this:

from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。"
model = TTS(language='ZH', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'zh.wav'
model.tts_to_file(text, speaker_ids['ZH'], output_path, speed=speed)

From the instructions: https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md

This error results:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], [line 12](vscode-notebook-cell:?execution_count=2&line=12)
      [9](vscode-notebook-cell:?execution_count=2&line=9) speaker_ids = model.hps.data.spk2id
     [11](vscode-notebook-cell:?execution_count=2&line=11) output_path = 'zh.wav'
---> [12](vscode-notebook-cell:?execution_count=2&line=12) model.tts_to_file(text, speaker_ids['ZH'], output_path, speed=speed)

File ~/projects/MeloTTS/melo/api.py:100, in TTS.tts_to_file(self, text, speaker_id, output_path, sdp_ratio, noise_scale, noise_scale_w, speed, pbar, format, position, quiet)
     [98](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/api.py:98)     t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
     [99](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/api.py:99) device = self.device
--> [100](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/api.py:100) bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
    [101](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/api.py:101) with torch.no_grad():
    [102](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/api.py:102)     x_tst = phones.to(device).unsqueeze(0)

File ~/projects/MeloTTS/melo/utils.py:38, in get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id)
     [36](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/utils.py:36)     ja_bert = torch.zeros(768, len(phone))
     [37](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/utils.py:37) else:
---> [38](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/utils.py:38)     bert = get_bert(norm_text, word2ph, language_str, device)
     [39](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/utils.py:39)     del word2ph
     [40](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/utils.py:40)     assert bert.shape[-1] == len(phone), phone

File ~/projects/MeloTTS/melo/text/__init__.py:34, in get_bert(norm_text, word2ph, language, device)
     [30](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/text/__init__.py:30) from .korean import get_bert_feature as kr_bert
     [32](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/text/__init__.py:32) lang_bert_func_map = {"ZH": zh_bert, "EN": en_bert, "JP": jp_bert, 'ZH_MIX_EN': zh_mix_en_bert, 
     [33](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/projects/MeloTTS/melo/text/__init__.py:33)                       'FR': fr_bert, 'SP': sp_bert, 'ES': sp_bert, "KR": kr_bert}
...
   [2235](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/functional.py:2235)     # remove once script supports set_grad_enabled
   [2236](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/functional.py:2236)     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> [2237](https://file+.vscode-resource.vscode-cdn.net/Users/me/projects/pronunciation-exploration/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/functional.py:2237) return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Placeholder storage has not been allocated on MPS device!

Does anyone else experience this? Any tips on debugging?