twardoch / audiostretchy

AudioStretchy is a Python wrapper around the `audio-stretch` C library, which performs fast, high-quality time-stretching of WAV/MP3 files without changing their pitch. Works well for speech, can time-stretch silence separately.
https://pypi.org/project/audiostretchy/
BSD 3-Clause "New" or "Revised" License
35 stars 2 forks source link

silence in the audios #6

Open Thomcle opened 1 year ago

Thomcle commented 1 year ago

When I stretch an audio with a ratio different from 0.5 in the interval ]0; 1[ the audio size is indeed accelerated but the duration doesn't change. For example when do that in python:

stretch_audio("input.wav", "output.wav", ratio=0.7)

The output.wav file has the same duration as input.wav, but the beginning is correctly accelerated. There is a silence to fill it.

I plot a graph with the ratio on the x-axis (here there was a 0.1 difference between each calculation) and the time on the y-axis. You can see that there are steps and that the duration doesn't change for "special" values :

time evolution of a 47-second audio as a function of ratio

Madjid513 commented 9 months ago

Same error occured with me, any solution ?

Thomcle commented 9 months ago

What I did was I cut the end of the audio from the initial duration and ratio. It's not really a problem

donghyunismyname commented 5 months ago

My fix for this problem is as follows. I added a single line.

def stretch_audio(
    input_path: str,
    output_path: str,
    ratio: float = 1.0,
    gap_ratio: float = 0.0,
    upper_freq: int = 333,
    lower_freq: int = 55,
    buffer_ms: float = 25,
    threshold_gap_db: float = -40,
    double_range: bool = False,
    fast_detection: bool = False,
    normal_detection: bool = False,
    sample_rate: int = 0,
):
    """Stretches the input audio file and saves the result to the output path.

    Args:
        input_path (str): The path to the input WAV or MP3 audio file.
        output_path (str): The path to save the stretched WAV or MP3 audio file.
        ratio (float, optional): The stretch ratio, where values greater than 1.0 will extend the audio and values less than 1.0 will shorten the audio. From 0.5 to 2.0, or with `-d` from 0.25 to 4.0. Default is 1.0 = no stretching.
        gap_ratio (float, optional): The stretch ratio for gaps (silence) in the audio. Default is 0.0 = uses ratio.
        upper_freq (int, optional): The upper frequency limit for period detection in Hz. Default is 333 Hz.
        lower_freq (int, optional): The lower frequency limit. Default is 55 Hz.
        buffer_ms (float, optional): The buffer size in milliseconds for processing the audio in chunks (useful with `-g`). Default is 25 ms.
        threshold_gap_db (float, optional): The threshold level in dB to determine if a section of audio is considered a gap (for `-g`). Default is -40 dB.
        double_range (bool, optional): If set, doubles the min/max range of stretching from 0.5-2.0 to 0.25-4.0.
        fast_detection (bool, optional): If set, enables fast period detection, which may speed up processing but reduce the quality of the stretched audio.
        normal_detection (bool, optional): If set, forces the algorithm to use normal period detection instead of fast period detection.
        sample_rate (int, optional): The target sample rate for resampling the stretched audio in Hz (if installed with `[all]`). Default is 0 = use sample rate of the input audio.
    """
    audio_stretch = AudioStretch()
    audio_stretch.open(input_path)
    audio_stretch.stretch(
        ratio,
        gap_ratio,
        upper_freq,
        lower_freq,
        buffer_ms,
        threshold_gap_db,
        double_range,
        fast_detection,
        normal_detection,
    )
    audio_stretch.resample(sample_rate)
    audio_stretch.samples = audio_stretch.samples[:int(audio_stretch.nframes * ratio)] # Fix
    audio_stretch.save(output_path)