nii-yamagishilab / mos-finetune-ssl

BSD 3-Clause "New" or "Revised" License
83 stars 21 forks source link

normalize #10

Open chenyang399 opened 1 week ago

chenyang399 commented 1 week ago

how to do the sv56-normalization,can i use this function and set the loudness_head_room_db=56

def normalize_loudness(wav: torch.Tensor, sample_rate: int, loudness_headroom_db: float = 14,
                       loudness_compressor: bool = False, energy_floor: float = 2e-3):
    """Normalize an input signal to a user loudness in dB LKFS.
    Audio loudness is defined according to the ITU-R BS.1770-4 recommendation.

    Args:
        wav (torch.Tensor): Input multichannel audio data.
        sample_rate (int): Sample rate.
        loudness_headroom_db (float): Target loudness of the output in dB LUFS.
        loudness_compressor (bool): Uses tanh for soft clipping.
        energy_floor (float): anything below that RMS level will not be rescaled.
    Returns:
        torch.Tensor: Loudness normalized output data.
    """
    energy = wav.pow(2).mean().sqrt().item()
    if energy < energy_floor:
        return wav
    transform = torchaudio.transforms.Loudness(sample_rate)
    input_loudness_db = transform(wav).item()
    # calculate the gain needed to scale to the desired loudness level
    delta_loudness = -loudness_headroom_db - input_loudness_db
    gain = 10.0 ** (delta_loudness / 20.0)
    output = gain * wav
    if loudness_compressor:
        output = torch.tanh(output)
    assert output.isfinite().all(), (input_loudness_db, wav.pow(2).mean().sqrt())
    return output
ecooper7 commented 8 hours ago

Hi, the scripts used for sv56 normalization for this project are included in the BVCC dataset which can be found here: https://zenodo.org/records/6572573

However, rather than downloading that entire dataset just to get the scripts, let me point you instead to a different project repo that also uses sv56 normalization -- see the directory here: https://github.com/nii-yamagishilab/ZMM-TTS/tree/b0381a4b70f831b3b171318c2db75e37031019f1/scripts/sv56scripts

The script install_sv56.sh downloads and installs the sv56 code from the ITU repo. Then, you can run batch_normRMSE.sh as in the example in ZMM-TTS/scripts/norm_wav.sh