lenML / Speech-AI-Forge

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
https://huggingface.co/spaces/lenML/ChatTTS-Forge
GNU Affero General Public License v3.0
710 stars 87 forks source link

[ISSUE] 如何应用韵律apply_prosody #145

Closed cpken closed 1 month ago

cpken commented 1 month ago

确认清单

你的issues

https://github.com/bmcfee/pyrubberband

modules/core/pipeline/processors/Adjuster.py

import pyrubberband as pyrb

def apply_prosody_to_audio_data(
    audio_data: np.ndarray,
    rate: float = 1,
    volume: float = 0,
    pitch: float = 0,
    sr: int = 24000,
) -> np.ndarray:
    if audio_data.dtype == np.int16:
        # NOTE: 其实感觉一个报个错...
        audio_data = audio_data.astype(np.float32) / 32768.0
    elif audio_data.dtype == np.float16:
        audio_data = audio_data.astype(np.float32)

    if rate != 1:
        audio_data = pyrb.time_stretch(audio_data, sr=sr, rate=rate)
    if volume != 0:
        volume = max(min(volume, 6), -20)
        gain = 10 ** (volume / 20)
        audio_data = audio_data * gain
    if pitch != 0:
        audio_data = pyrb.pitch_shift(audio_data, sr=sr, n_steps=pitch)
    return audio_data