microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Creative Commons Attribution 4.0 International
1.1k stars 411 forks source link

VocalSet samples have a wrong sampling rate #106

Open Rikorose opened 2 years ago

Rikorose commented 2 years ago

The samples provided in datasets_fullband/clean_fullband/VocalSet_48kHz_mono have a reported sampling rate of 48kHz.

The real sampling rate, however, is 16kHz, which results in a mickey mouse type voice (i.e. a pitch 3 times higher than originally).

To reproduce with sox:

$ soxi vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav

Input File     : 'vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav'
Channels       : 1
Sample Rate    : 48000
Precision      : 16-bit
Duration       : 00:00:27.07 = 1299432 samples ~ 2030.36 CDDA sectors
File Size      : 2.60M
Bit Rate       : 768k
Sample Encoding: 16-bit Signed Integer PCM

// This sounds awful:
$ play vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav
// With sampling rate of 16kHz this sounds normal:
$ play -r 16k vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav
nicriverhoo commented 2 years ago

Would you please fix this issue at your early convenience?? many thanks.

nicriverhoo commented 2 years ago

zenodo VocalSet would be helpful.

niemiaszek commented 2 years ago

@nicriverhoo you could fix this issue with SoX sampling rate conversion. f.e you could run such bash in vocalset directory:

find ./ -name "*.wav" -exec sox -r 16k {} ../singing_voice_16k/{} \;