lenML / ChatTTS-Forge

🍦 ChatTTS-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
https://huggingface.co/spaces/lenML/ChatTTS-Forge
GNU Affero General Public License v3.0
657 stars 82 forks source link

[BUG:API] 使用/v1/speaker/create创建了一个spaker,调用/v1/audio/speech接口时报错 #114

Closed DYHouse closed 1 month ago

DYHouse commented 1 month ago

确认清单

Forge Commit 或者 Tag

master

Python 版本

3.10

PyTorch 版本

2.3.3

操作系统信息

ubuntu 12.4

BUG 描述

使用/v1/speaker/create创建了一个spaker,调用/v1/audio/speech接口时报错

BUG 端点

/v1/audio/speech

复现参数

第一步:/v1/speaker/create { "name":"Mabtoic", "gender":"female", "describe":"Mabtoic-女", "seed":2322536682 } 执行接口可成功创建speaker 第二步:/v1/audio/speech { "voice":"Mabtoic", "input":"ChatTTS-Forge是一个伟大的项目" } 执行接口报如下错误: Exception in thread Thread-7 (generate): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/root/ChatTTS-Forge/modules/core/pipeline/generate/BatchGenerate.py", line 47, in generate self.generate_batch(batch) File "/root/ChatTTS-Forge/modules/core/pipeline/generate/BatchGenerate.py", line 59, in generate_batch results = model.generate_batch(segments=segments, context=self.context) File "/root/ChatTTS-Forge/modules/core/models/tts/ChatTtsModel.py", line 44, in generate_batch return self.generate_batch_base(segments, context, stream=False) File "/root/ChatTTS-Forge/modules/core/models/tts/ChatTtsModel.py", line 166, in generate_batch_base results = infer.generate_audio( File "/root/ChatTTS-Forge/modules/core/models/zoo/ChatTTSInfer.py", line 320, in generate_audio data = self._generate_audio( File "/root/ChatTTS-Forge/modules/core/models/zoo/ChatTTSInfer.py", line 297, in _generate_audio return self.infer( File "/root/ChatTTS-Forge/modules/core/models/zoo/ChatTTSInfer.py", line 88, in infer return next(res_gen) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 36, in generator_context response = gen.send(None) File "/root/ChatTTS-Forge/modules/core/models/zoo/ChatTTSInfer.py", line 136, in _infer for result in self.instance._infer_code( File "/root/ChatTTS-Forge/modules/ChatTTS/ChatTTS/core.py", line 624, in _infer_code self._apply_spk_emb(emb, params.spk_emb, input_ids, len(text)) File "/root/ChatTTS-Forge/modules/ChatTTS/ChatTTS/core.py", line 564, in _apply_spk_emb .expand(emb.shape) RuntimeError: The expanded size of the tensor (768) must match the existing size (384) at non-singleton dimension 2. Target sizes: [1, 10, 768]. Tensor sizes: [1, 1, 384]

期望结果

生成语音

实际结果

生成语音

错误信息

No response

zhzLuke96 commented 1 month ago

fixed c0a13734ee331d5541c9a027d13adbbaa4fc5210