DNSMOS skips windows on longer audio samples #130

Open stiansel opened 2 years ago

stiansel commented 2 years ago

When running DNSMOS on longer audio samples, the slicing currently done in the may end up with a float error on the window slice end. For instance a hop idx of 7 sent into the current code of int((idx + INPUT_LENGTH) * hop_len_samples) may yield int(256159.99999999997) on certain systems. This produces a stop slice index of 256159 which in turn leads the cropped audio to be 1 sample too short. The code then drops these shorter windows, which skips parts of the evaluation.

Simple reproducible example where the relevant part of the code has been extracted:

import numpy as np
from collections import Counter

num_hops = 30  # just for this demo
hop_len_samples = 16000
len_samples = int(INPUT_LENGTH * hop_len_samples)

seq_lengths = []
seq_lengths_new = []
for idx in range(num_hops):
    # Current window slicing:
    #   audio_seg = audio[int(idx * hop_len_samples): int((idx + INPUT_LENGTH) * hop_len_samples)]
    # Possible fix:
    #   audio_seg = audio[int(idx * hop_len_samples): int(idx * hop_len_samples) + len_samples]
    slice_start = int(idx * hop_len_samples)
    slice_end = int((idx + INPUT_LENGTH) * hop_len_samples)
    slice_end_new = slice_start + len_samples
    if slice_end != slice_end_new:
        print(f'idx {idx} - slice [{slice_start}:{slice_end}] vs [{slice_start}:{slice_end_new}] - length {slice_end-slice_start} vs {slice_end_new-slice_start}')

    # if len(audio_seg) < len_samples:
    #    Note: The shorter length windows would normally be skipped here.
    #    Also: I think check isn't needed as the sample is first padded to ensure it is always long enough.

print('Sequence lengths (current):', Counter(seq_lengths))
print('Sequence lengths (fixed):', Counter(seq_lengths_new))

The output of the above script:

idx 7 - slice [112000:256159] vs [112000:256160] - length 144159 vs 144160
idx 8 - slice [128000:272159] vs [128000:272160] - length 144159 vs 144160
idx 9 - slice [144000:288159] vs [144000:288160] - length 144159 vs 144160
idx 22 - slice [352000:496159] vs [352000:496160] - length 144159 vs 144160
idx 23 - slice [368000:512159] vs [368000:512160] - length 144159 vs 144160
Sequence lengths (current): Counter({144159: 17, 144160: 13})
Sequence lengths (fixed): Counter({144160: 30})