myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

MIT License

4.81k stars 625 forks source link

声音忽大忽小是什么原因 #45

Open skyliwq opened 8 months ago

skyliwq commented 8 months ago

中文生成的声音忽大忽小是什么原因，特别是长文本的时候

微信图片_20240302180653

skyliwq commented 8 months ago

声音最小的时候几乎听不清

skyliwq commented 8 months ago

大神声音忽大忽小的问题能解决吗？

Zengyi-Qin commented 8 months ago

Please just amplify the volume or use some post processing normalizing technique

skyliwq commented 8 months ago

Please just amplify the volume or use some post processing normalizing technique

我需要接到大模型上使用，语音声音忽大忽小，体验不是很好，希望能改进

luobotaxinghu commented 8 months ago

看日志应该是拆成一句话一句话转的，这样不可避免每句话的音量无法对齐，看 @Zengyi-Qin 的回复是做后处理如标准化等这个工作其实应该项目内部处理，而不是交给用户，用户拿到整段音频是不能做处理的，需要改中间实现的代码分句音频处理

skyliwq commented 8 months ago

@Zengyi-Qin，是的，如果是这样在某些应用场景下，就失去使用价值了

看日志应该是拆成一句话一句话转的，这样不可避免每句话的音量无法对齐，看 @Zengyi-Qin 的回复是做后处理如标准化等这个工作其实应该项目内部处理，而不是交给用户，用户拿到整段音频是不能做处理的，需要改中间实现的代码分句音频处理

MissingTwins commented 8 months ago

pip install ffmpeg-normalize
ffmpeg-normalize input.wav -c:a libopus -b:a 128k -o output.oga -f

WARNING: Input file had loudness range of 10.1. This is larger than the loudness range target (7.0). Normalization will revert to dynamic mode.

Well, normalization does not solve the issue. The dynamic range remains too wide, with the volume fluctuating randomly between loud and soft.

andyweiqiu commented 8 months ago

pip install ffmpeg-normalize ffmpeg-normalize input.wav -c:a libopus -b:a 128k -o output.oga -f

WARNING: Input file had loudness range of 10.1. This is larger than the loudness range target (7.0). Normalization will revert to dynamic mode.

Well, normalization does not solve the issue. The dynamic range remains too wide, with the volume fluctuating randomly between loud and soft.

直接处理肯定是不行的，整段音频音量会同时增大或减少，在听感上跟输出的音频没啥区别，要在分段输出那里进行处理。

v3ucn commented 6 months ago

pip install pyloudnorm

加载音频文件

data, rate = sf.read(r"D:\Downloads\output_v2_zh.wav")

峰值归一化至 -1 dB

peak_normalized_audio = pyln.normalize.peak(data, -1.0)

测量响度

meter = pyln.Meter(rate) loudness = meter.integrated_loudness(data)

响度归一化至 -12 dB LUFS

loudness_normalized_audio = pyln.normalize.loudness(data, loudness, -12.0)

sf.write("./normalized_audio.wav", loudness_normalized_audio, rate)