microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Creative Commons Attribution 4.0 International
1.08k stars 411 forks source link

How to get DNS Challenge 2020 dataset? #108

Open Yaho-b opened 2 years ago

Yaho-b commented 2 years ago

hi , Can u tell me how to get DNS Challenge 2020 dataset? There just have DNS 2022 and 2021 dataset url, Can u share the url of DNS 2020. Thanks a lot!

motus commented 2 years ago

@yuyislam you can download it here: datasets-interspeech2020.tar.bz2 (single file, 5.4MB)

I'll document it soon and close the issue afterwards.

Yaho-b commented 2 years ago

hi @motus, I downloaded this file on my pc, and tar -jxvf *.tar.bz2, but the .wav file just only 1KB. Am I using it the wrong way, please tell me how to use it correctly. Thanks a lot!

Rikorose commented 2 years ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM
immersky commented 1 year ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Hey, It takes about 70 hours to download the data with the command. Do you have any suggestions to make it quicker? Thanks a lot.

immersky commented 1 year ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Hey, It takes about 70 hours to download the data with the command. Do you have any suggestions to make it quicker? Thanks a lot.

Thank you! I've solved the problem by using VPN to download the data.

guggugg commented 1 year ago

我是这样做的:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

现在您应该能够处理音频文件了:

soxi datasets/test_set/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

哎,用命令下载数据大概需要70个小时。你有什么建议可以让它更快吗?非常感谢。

谢谢!我已经通过使用 VPN 下载数据解决了这个问题。

您好 可以分享一下test_set/synthetic/这个文件夹吗?我研究了两天都下不了lfs文件...我只想要这个文件夹的数据。

Tufahel commented 1 year ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

@Rikorose, Thanks a lot. :+1: This solution works fine.

wslbeck commented 1 year ago

I did it like this:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

Now you should be able to process the audio files:

soxi datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Hey, It takes about 70 hours to download the data with the command. Do you have any suggestions to make it quicker? Thanks a lot.

Thank you! I've solved the problem by using VPN to download the data.

@Tufahel Hi, can you share your dataset? I still can't seem to download it

cxwang822 commented 9 months ago

我是这样做的:

git clone git@github.com:microsoft/DNS-Challenge.git -b interspeech2020/master --single-branch dns1-repo
cd dns1-repo
git lfs install && git lfs track "*.wav"
git lfs pull -I datasets/test_set/synthetic/

现在您应该能够处理音频文件了:

soxi datasets/test_set/no_reverb/clean/clean_fileid_0.wav

Input File     : 'datasets/test_set/synthetic/no_reverb/clean/clean_fileid_0.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.00 = 160000 samples ~ 750 CDDA sectors
File Size      : 320k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

哎,用命令下载数据大概需要70个小时。你有什么建议可以让它更快吗?非常感谢。

谢谢!我已经通过使用 VPN 下载数据解决了这个问题。

您好 可以分享一下test_set/synthetic/这个文件夹吗?我研究了两天都下不了lfs文件...我只想要这个文件夹的数据。

您好,请问您有了test_set/synthetic/这个文件夹嘛?我也是只需要这个文件夹的数据,请问您能分享一下吗?谢谢!