microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Creative Commons Attribution 4.0 International
1.06k stars 407 forks source link

Dataset Corrupted #166

Open Abalam-29895 opened 1 year ago

Abalam-29895 commented 1 year ago

The audio files are being corrupted after downloading from the shell script which is provided. I have attached the link which I have been using to download and the error message from shell. https://github.com/microsoft/DNS-Challenge/blob/2db96d5f75257df764a6ef66513b4b97bc707f30/download-dns-challenge-2.sh

Error Message :- **bzip2: Data integrity error when decompressing. Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files.**

Can you give me a fix for this? Thank you !

thebarnable commented 1 year ago

I'm experiencing a very similar issue. In the download-dns-challenge-4.sh script, I used the curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j line. Not for all tars, but for some (e.g. clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2):

curl: (56) OpenSSL SSL_read: Connection timed out, errno 110

bzip2: Compressed file ends unexpectedly;
    perhaps it is corrupted?  *Possible* reason follows.
bzip2: Inappropriate ioctl for device
    Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
seowwj commented 1 year ago

I faced the same issue (but with different files), the way it was solved for me was by retrying the download.

JINSCOTT commented 8 months ago

I tried to use AzCopy" to download the files and it is way faster and much more reliable than wget and curl. No more timeouts and having to re-download the entire file from the start again. Get AzCopy working and try something like this in the download file scripts: azcopy copy "$URL" "$OUTPUT_PATH/$BLOB"

valentin710 commented 1 month ago

I had the same issue as @thebarnable with the download-dns-challenge-5-headset-training.sh script. attempted multiple downloads so far, but without success.