microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Creative Commons Attribution 4.0 International
1.08k stars 411 forks source link

Multiple talkers in same file in personalized dns data (Italian) #134

Open DamRsn opened 2 years ago

DamRsn commented 2 years ago

Hello,

During some informal listening of audio data from PDNS track, I noticed that Italian speech files contain multiple speakers. While in the ICASSP 2022 DEEP NOISE SUPPRESSION CHALLENGE paper, it is said: "PDNS track has clean speech where each audio clip is concatenations of all audio clips belonging to a talker."

The issue makes enrollments files and "clean" files incompatible. I had to remove all Italian files from my training dataset.

I suggest to either remove Italian data from PDNS dataset or to get the correct clean files if you have the possibility.

I noticed the issue only with Italian data and I don't know if that occurs sometimes in the others languages.

One example is pdns_training_set/raw/clean/italian/complete_italian_novelle_per_un_anno_02.wav where the speakers at the beggining and at 1:39:00 are clearly different. The beginning of that audio file is also the exact same speech sample as the corresponding enrollment.

hsiaohan0827 commented 2 years ago

The issue is also occurred in part of english/librivox speech. For example, english/librivox/enrol_read_speech_complete_reader_00381.wav contains multiple-speaker conversations. The speakers switch frequently and may affect the learning of personal noise suppression.