mindspore-lab / mindnlp

Easy-to-use and high-performance NLP and LLM framework based on MindSpore, compatible with models and datasets of 🤗Huggingface.
https://mindnlp.cqu.ai/
Apache License 2.0
706 stars 200 forks source link

mms-tts-eng模型推理生成不了音频。 #1808

Open yegoling opened 2 weeks ago

yegoling commented 2 weeks ago

Describe the bug/ 问题描述 (Mandatory / 必填) mms-tts-eng模型推理时报错。

CPU

To Reproduce / 重现步骤 (Mandatory / 必填)

运行推理代码

from mindnlp.transformers import VitsModel, AutoTokenizer
model = VitsModel.from_pretrained("./model/mms-tts-eng",from_pt=True)
tokenizer = AutoTokenizer.from_pretrained("./model/mms-tts-eng",from_pt=True)

text = "some example text in the English language"

inputs = tokenizer(text, return_tensors='ms')
output = model(**inputs).waveform

import scipy
scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output)

Expected behavior / 预期结果 (Mandatory / 必填) 生成音频文件

Screenshots/ 日志 / 截图 (Mandatory / 必填)

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\23974\AppData\Local\Temp\jieba.cache
Loading model cost 0.572 seconds.
Prefix dict has been built successfully.
Some weights of VitsModel were not initialized from the model checkpoint at ./model/mms-tts-eng and are newly initialized: ['flow.flows.0.wavenet.in_layers.0.weight', 'flow.flows.0.wavenet.in_layers.1.weight', 'flow.flows.0.wavenet.in_layers.2.weight', 'flow.flows.0.wavenet.in_layers.3.weight', 'flow.flows.0.wavenet.res_skip_layers.0.weight', 'flow.flows.0.wavenet.res_skip_layers.1.weight', 'flow.flows.0.wavenet.res_skip_layers.2.weight', 'flow.flows.0.wavenet.res_skip_layers.3.weight', 'flow.flows.1.wavenet.in_layers.0.weight', 'flow.flows.1.wavenet.in_layers.1.weight', 'flow.flows.1.wavenet.in_layers.2.weight', 'flow.flows.1.wavenet.in_layers.3.weight', 'flow.flows.1.wavenet.res_skip_layers.0.weight', 'flow.flows.1.wavenet.res_skip_layers.1.weight', 'flow.flows.1.wavenet.res_skip_layers.2.weight', 'flow.flows.1.wavenet.res_skip_layers.3.weight', 'flow.flows.2.wavenet.in_layers.0.weight', 'flow.flows.2.wavenet.in_layers.1.weight', 'flow.flows.2.wavenet.in_layers.2.weight', 'flow.flows.2.wavenet.in_layers.3.weight', 'flow.flows.2.wavenet.res_skip_layers.0.weight', 'flow.flows.2.wavenet.res_skip_layers.1.weight', 'flow.flows.2.wavenet.res_skip_layers.2.weight', 'flow.flows.2.wavenet.res_skip_layers.3.weight', 'flow.flows.3.wavenet.in_layers.0.weight', 'flow.flows.3.wavenet.in_layers.1.weight', 'flow.flows.3.wavenet.in_layers.2.weight', 'flow.flows.3.wavenet.in_layers.3.weight', 'flow.flows.3.wavenet.res_skip_layers.0.weight', 'flow.flows.3.wavenet.res_skip_layers.1.weight', 'flow.flows.3.wavenet.res_skip_layers.2.weight', 'flow.flows.3.wavenet.res_skip_layers.3.weight', 'posterior_encoder.wavenet.in_layers.0.weight', 'posterior_encoder.wavenet.in_layers.1.weight', 'posterior_encoder.wavenet.in_layers.10.weight', 'posterior_encoder.wavenet.in_layers.11.weight', 'posterior_encoder.wavenet.in_layers.12.weight', 'posterior_encoder.wavenet.in_layers.13.weight', 'posterior_encoder.wavenet.in_layers.14.weight', 'posterior_encoder.wavenet.in_layers.15.weight', 'posterior_encoder.wavenet.in_layers.2.weight', 'posterior_encoder.wavenet.in_layers.3.weight', 'posterior_encoder.wavenet.in_layers.4.weight', 'posterior_encoder.wavenet.in_layers.5.weight', 'posterior_encoder.wavenet.in_layers.6.weight', 'posterior_encoder.wavenet.in_layers.7.weight', 'posterior_encoder.wavenet.in_layers.8.weight', 'posterior_encoder.wavenet.in_layers.9.weight', 'posterior_encoder.wavenet.res_skip_layers.0.weight', 'posterior_encoder.wavenet.res_skip_layers.1.weight', 'posterior_encoder.wavenet.res_skip_layers.10.weight', 'posterior_encoder.wavenet.res_skip_layers.11.weight', 'posterior_encoder.wavenet.res_skip_layers.12.weight', 'posterior_encoder.wavenet.res_skip_layers.13.weight', 'posterior_encoder.wavenet.res_skip_layers.14.weight', 'posterior_encoder.wavenet.res_skip_layers.15.weight', 'posterior_encoder.wavenet.res_skip_layers.2.weight', 'posterior_encoder.wavenet.res_skip_layers.3.weight', 'posterior_encoder.wavenet.res_skip_layers.4.weight', 'posterior_encoder.wavenet.res_skip_layers.5.weight', 'posterior_encoder.wavenet.res_skip_layers.6.weight', 'posterior_encoder.wavenet.res_skip_layers.7.weight', 'posterior_encoder.wavenet.res_skip_layers.8.weight', 'posterior_encoder.wavenet.res_skip_layers.9.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING] KERNEL(,4408,?):2024-11-9 15:52:50 [mindspore\ccsrc\kernel/kernel.h:916] mindspore::kernel::CheckShapeNull] For 'ReduceMin', the shape of input cannot contain zero, but got [const vector]{0}
[WARNING] KERNEL(,4408,?):2024-11-9 15:52:50 [mindspore\ccsrc\kernel/kernel.h:916] mindspore::kernel::CheckShapeNull] For 'ReduceMax', the shape of input cannot contain zero, but got [const vector]{0}
[WARNING] KERNEL(,4408,?):2024-11-9 15:52:50 [mindspore\ops\kernel\cpu\arithmetic_cpu_kernel.cc:171] mindspore::kernel::`anonymous-namespace'::ArithmeticCpuTypeFunc<float>::RunFunc] Mul output shape contain 0, output_shape: [const vector]{0, 1}
[WARNING] KERNEL(,4408,?):2024-11-9 15:52:50 [mindspore\ops\kernel\cpu\arithmetic_cpu_kernel.cc:171] mindspore::kernel::`anonymous-namespace'::ArithmeticCpuTypeFunc<float>::RunFunc] Add output shape contain 0, output_shape: [const vector]{0, 1}

Additional context / 备注 (Optional / 选填) 代码.md