Unexpected clean and noisy data

My config:

[noisy_speech]

sampling_rate: 48000
audioformat: *.wav
audio_length: 10 
silence_length: 0.2
total_hours: 500
snr_lower: 0
snr_upper: 40
randomize_snr: True
target_level_lower: -35
target_level_upper: -15
total_snrlevels: 5 
clean_activity_threshold: 0.6
noise_activity_threshold: 0.0
fileindex_start: None
fileindex_end: None
is_test_set: False
noise_dir: datasets/noise
speech_dir: datasets/clean
noisy_destination: noisy
clean_destination: clean
noise_destination: noise
log_dir: logs

# Unit tests config
snr_test: True
norm_test: True
sampling_rate_test = True
clipping_test = True

unit_tests_log_dir: ./unittests_logs

For this config, I've noticed some unexpected behavior in one (clean, noisy) pair. noisy file: book_04362_chp_0020_reader_11254_76_colv997yUuw_snr10_fileid_62459.wav clean file(generated): clean_fileid_62459.wav main clean file: book_04362_chp_0020_reader_11254_76.wav

It seems that the generated clean files also contain noisy data. I think this is the reason for upsampling from 16KHz to 48KHz. But don't know the exact reason. What is the main reason for it, and how can I prevent it?

microsoft / DNS-Challenge

Unexpected clean and noisy data #118