VocalSet samples have a wrong sampling rate

Rikorose commented 2 years ago

The samples provided in datasets_fullband/clean_fullband/VocalSet_48kHz_mono have a reported sampling rate of 48kHz.

The real sampling rate, however, is 16kHz, which results in a mickey mouse type voice (i.e. a pitch 3 times higher than originally).

To reproduce with sox:

$ soxi vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav

Input File     : 'vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav'
Channels       : 1
Sample Rate    : 48000
Precision      : 16-bit
Duration       : 00:00:27.07 = 1299432 samples ~ 2030.36 CDDA sectors
File Size      : 2.60M
Bit Rate       : 768k
Sample Encoding: 16-bit Signed Integer PCM

// This sounds awful:
$ play vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav
// With sampling rate of 16kHz this sounds normal:
$ play -r 16k vocalset_female2_scales_vocal_fry_scales_vocal_fry_48kHz.wav

nicriverhoo commented 2 years ago

Would you please fix this issue at your early convenience?? many thanks.

nicriverhoo commented 2 years ago

zenodo VocalSet would be helpful.

niemiaszek commented 2 years ago

@nicriverhoo you could fix this issue with SoX sampling rate conversion. f.e you could run such bash in vocalset directory:

find ./ -name "*.wav" -exec sox -r 16k {} ../singing_voice_16k/{} \;

microsoft / DNS-Challenge

VocalSet samples have a wrong sampling rate #106