microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Creative Commons Attribution 4.0 International
1.13k stars 415 forks source link

DNSMOS skips windows on longer audio samples #130

Open stiansel opened 2 years ago

stiansel commented 2 years ago

When running DNSMOS on longer audio samples, the slicing currently done in the dnsmos_local.py may end up with a float error on the window slice end. For instance a hop idx of 7 sent into the current code of int((idx + INPUT_LENGTH) * hop_len_samples) may yield int(256159.99999999997) on certain systems. This produces a stop slice index of 256159 which in turn leads the cropped audio to be 1 sample too short. The code then drops these shorter windows, which skips parts of the evaluation.

Simple reproducible example where the relevant part of the code has been extracted:


import numpy as np
from collections import Counter

INPUT_LENGTH = 9.01
num_hops = 30  # just for this demo
hop_len_samples = 16000
len_samples = int(INPUT_LENGTH * hop_len_samples)

seq_lengths = []
seq_lengths_new = []
for idx in range(num_hops):
    # Current window slicing:
    #   audio_seg = audio[int(idx * hop_len_samples): int((idx + INPUT_LENGTH) * hop_len_samples)]
    # Possible fix:
    #   audio_seg = audio[int(idx * hop_len_samples): int(idx * hop_len_samples) + len_samples]
    slice_start = int(idx * hop_len_samples)
    slice_end = int((idx + INPUT_LENGTH) * hop_len_samples)
    slice_end_new = slice_start + len_samples
    if slice_end != slice_end_new:
        print(f'idx {idx} - slice [{slice_start}:{slice_end}] vs [{slice_start}:{slice_end_new}] - length {slice_end-slice_start} vs {slice_end_new-slice_start}')
    seq_lengths.append(slice_end-slice_start)
    seq_lengths_new.append(slice_end_new-slice_start)

    # if len(audio_seg) < len_samples:
    #    Note: The shorter length windows would normally be skipped here.
    #    Also: I think check isn't needed as the sample is first padded to ensure it is always long enough.

print('Sequence lengths (current):', Counter(seq_lengths))
print('Sequence lengths (fixed):', Counter(seq_lengths_new))

The output of the above script:

idx 7 - slice [112000:256159] vs [112000:256160] - length 144159 vs 144160
idx 8 - slice [128000:272159] vs [128000:272160] - length 144159 vs 144160
idx 9 - slice [144000:288159] vs [144000:288160] - length 144159 vs 144160
...
idx 22 - slice [352000:496159] vs [352000:496160] - length 144159 vs 144160
idx 23 - slice [368000:512159] vs [368000:512160] - length 144159 vs 144160
Sequence lengths (current): Counter({144159: 17, 144160: 13})
Sequence lengths (fixed): Counter({144160: 30})