Open skyliwq opened 8 months ago
声音最小的时候几乎听不清
大神声音忽大忽小的问题能解决吗?
Please just amplify the volume or use some post processing normalizing technique
Please just amplify the volume or use some post processing normalizing technique
我需要接到大模型上使用,语音声音忽大忽小,体验不是很好,希望能改进
看日志应该是拆成一句话一句话转的,这样不可避免每句话的音量无法对齐,看 @Zengyi-Qin 的回复是做后处理如标准化等 这个工作其实应该项目内部处理,而不是交给用户,用户拿到整段音频是不能做处理的,需要改中间实现的代码分句音频处理
@Zengyi-Qin,是的,如果是这样在某些应用场景下,就失去使用价值了
看日志应该是拆成一句话一句话转的,这样不可避免每句话的音量无法对齐,看 @Zengyi-Qin 的回复是做后处理如标准化等 这个工作其实应该项目内部处理,而不是交给用户,用户拿到整段音频是不能做处理的,需要改中间实现的代码分句音频处理
pip install ffmpeg-normalize
ffmpeg-normalize input.wav -c:a libopus -b:a 128k -o output.oga -f
WARNING: Input file had loudness range of 10.1. This is larger than the loudness range target (7.0). Normalization will revert to dynamic mode.
Well, normalization does not solve the issue. The dynamic range remains too wide, with the volume fluctuating randomly between loud and soft.
pip install ffmpeg-normalize
ffmpeg-normalize input.wav -c:a libopus -b:a 128k -o output.oga -f
WARNING: Input file had loudness range of 10.1. This is larger than the loudness range target (7.0). Normalization will revert to dynamic mode.
Well, normalization does not solve the issue. The dynamic range remains too wide, with the volume fluctuating randomly between loud and soft.
直接处理肯定是不行的,整段音频音量会同时增大或减少,在听感上跟输出的音频没啥区别,要在分段输出那里进行处理。
pip install pyloudnorm
data, rate = sf.read(r"D:\Downloads\output_v2_zh.wav")
peak_normalized_audio = pyln.normalize.peak(data, -1.0)
meter = pyln.Meter(rate) loudness = meter.integrated_loudness(data)
loudness_normalized_audio = pyln.normalize.loudness(data, loudness, -12.0)
sf.write("./normalized_audio.wav", loudness_normalized_audio, rate)
中文生成的声音忽大忽小是什么原因,特别是长文本的时候