How to get DNS Challenge 2020 dataset?

Yaho-b commented 2 years ago

hi , Can u tell me how to get DNS Challenge 2020 dataset? There just have DNS 2022 and 2021 dataset url, Can u share the url of DNS 2020. Thanks a lot!

motus commented 2 years ago

@yuyislam you can download it here: datasets-interspeech2020.tar.bz2 (single file, 5.4MB)

I'll document it soon and close the issue afterwards.

Yaho-b commented 2 years ago

hi @motus, I downloaded this file on my pc, and tar -jxvf *.tar.bz2, but the .wav file just only 1KB. Am I using it the wrong way, please tell me how to use it correctly. Thanks a lot!

Rikorose commented 2 years ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

immersky commented 2 years ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Hey, It takes about 70 hours to download the data with the command. Do you have any suggestions to make it quicker? Thanks a lot.

immersky commented 2 years ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Hey, It takes about 70 hours to download the data with the command. Do you have any suggestions to make it quicker? Thanks a lot.

Thank you! I've solved the problem by using VPN to download the data.

guggugg commented 1 year ago

我是这样做的：
git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/
现在您应该能够处理音频文件了：
soxi datasets/test_set/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM
哎，用命令下载数据大概需要70个小时。你有什么建议可以让它更快吗？非常感谢。
谢谢！我已经通过使用 VPN 下载数据解决了这个问题。

您好可以分享一下test_set/synthetic/这个文件夹吗？我研究了两天都下不了lfs文件...我只想要这个文件夹的数据。

Tufahel commented 1 year ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

@Rikorose, Thanks a lot. :+1: This solution works fine.

wslbeck commented 1 year ago

I did it like this:
git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/
Now you should be able to process the audio files:
soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM
Hey, It takes about 70 hours to download the data with the command. Do you have any suggestions to make it quicker? Thanks a lot.
Thank you! I've solved the problem by using VPN to download the data.

@Tufahel Hi, can you share your dataset? I still can't seem to download it

cxwang822 commented 10 months ago

我是这样做的：
git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/
现在您应该能够处理音频文件了：
soxi datasets/test_set/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM
哎，用命令下载数据大概需要70个小时。你有什么建议可以让它更快吗？非常感谢。
谢谢！我已经通过使用 VPN 下载数据解决了这个问题。
您好可以分享一下test_set/synthetic/这个文件夹吗？我研究了两天都下不了lfs文件...我只想要这个文件夹的数据。

您好，请问您有了test_set/synthetic/这个文件夹嘛？我也是只需要这个文件夹的数据，请问您能分享一下吗？谢谢！

microsoft / DNS-Challenge

How to get DNS Challenge 2020 dataset? #108