When running DNSMOS on longer audio samples, the slicing currently done in the dnsmos_local.py may end up with a float error on the window slice end. For instance a hop idx of 7 sent into the current code of int((idx + INPUT_LENGTH) * hop_len_samples) may yield int(256159.99999999997) on certain systems. This produces a stop slice index of 256159 which in turn leads the cropped audio to be 1 sample too short. The code then drops these shorter windows, which skips parts of the evaluation.
Simple reproducible example where the relevant part of the code has been extracted:
import numpy as np
from collections import Counter
INPUT_LENGTH = 9.01
num_hops = 30 # just for this demo
hop_len_samples = 16000
len_samples = int(INPUT_LENGTH * hop_len_samples)
seq_lengths = []
seq_lengths_new = []
for idx in range(num_hops):
# Current window slicing:
# audio_seg = audio[int(idx * hop_len_samples): int((idx + INPUT_LENGTH) * hop_len_samples)]
# Possible fix:
# audio_seg = audio[int(idx * hop_len_samples): int(idx * hop_len_samples) + len_samples]
slice_start = int(idx * hop_len_samples)
slice_end = int((idx + INPUT_LENGTH) * hop_len_samples)
slice_end_new = slice_start + len_samples
if slice_end != slice_end_new:
print(f'idx {idx} - slice [{slice_start}:{slice_end}] vs [{slice_start}:{slice_end_new}] - length {slice_end-slice_start} vs {slice_end_new-slice_start}')
seq_lengths.append(slice_end-slice_start)
seq_lengths_new.append(slice_end_new-slice_start)
# if len(audio_seg) < len_samples:
# Note: The shorter length windows would normally be skipped here.
# Also: I think check isn't needed as the sample is first padded to ensure it is always long enough.
print('Sequence lengths (current):', Counter(seq_lengths))
print('Sequence lengths (fixed):', Counter(seq_lengths_new))
The output of the above script:
idx 7 - slice [112000:256159] vs [112000:256160] - length 144159 vs 144160
idx 8 - slice [128000:272159] vs [128000:272160] - length 144159 vs 144160
idx 9 - slice [144000:288159] vs [144000:288160] - length 144159 vs 144160
...
idx 22 - slice [352000:496159] vs [352000:496160] - length 144159 vs 144160
idx 23 - slice [368000:512159] vs [368000:512160] - length 144159 vs 144160
Sequence lengths (current): Counter({144159: 17, 144160: 13})
Sequence lengths (fixed): Counter({144160: 30})
When running DNSMOS on longer audio samples, the slicing currently done in the dnsmos_local.py may end up with a float error on the window slice end. For instance a hop idx of 7 sent into the current code of
int((idx + INPUT_LENGTH) * hop_len_samples)
may yieldint(256159.99999999997)
on certain systems. This produces a stop slice index of256159
which in turn leads the cropped audio to be 1 sample too short. The code then drops these shorter windows, which skips parts of the evaluation.Simple reproducible example where the relevant part of the code has been extracted:
The output of the above script: