(tts) ➜ MeloTTS git:(main) ✗ melo "Text to read" output.wav
Text split to sentences.
Text to read
0%| | 0/1 [00:00<?, ?it/s]Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
0%| | 0/1 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/bin/melo", line 33, in
sys.exit(load_entry_point('melotts', 'console_scripts', 'melo')())
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(args, kwargs)
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/main.py", line 36, in main
model.tts_to_file(text, spkr, output_path, speed=speed)
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/api.py", line 110, in tts_to_file
audio = self.model.infer(
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/models.py", line 994, in infer
x, m_p, logs_p, x_mask = self.enc_p(
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/models.py", line 377, in forward
x = self.encoder(x x_mask, x_mask, g=g)
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, kwargs)
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/attentions.py", line 107, in forward
y = self.attn_layers[i](x, x, attn_mask)
File "/opt/homebrew/Caskroom/miniconda/base/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/attentions.py", line 263, in forward
x, self.attn = self.attention(q, k, v, mask=attn_mask)
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/attentions.py", line 280, in attention
key_relative_embeddings = self._get_relative_embeddings(self.emb_rel_k, t_s)
File "/Users/zhuchenjie/vscode-workspaces/MeloTTS/melo/attentions.py", line 344, in _get_relative_embeddings
padded_relative_embeddings = F.pad(
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
(tts) ➜ MeloTTS git:(main) ✗ melo "Text to read" output.wav