modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.07k stars 93 forks source link

Could you provide the md5 value of train.tar.gz-part-{a-f} ? #49

Closed llearner closed 8 months ago

llearner commented 8 months ago

We downloaded the train.tar.gz-part-{a-f}, but the md5 value of the merged file is wrong. We are not sure which file is the wrong one.

llearner commented 8 months ago

uncompress error message: tar: Skipping to next header tar: Archive contains \0\b\0\004\0\006\0\005\0\004\0' where numeric off_t value expected tar: Archive contains\0\005\0\r\0\004\0\004\0\a\0' where numeric time_t value expected tar: Archive value -1095216594940 is out of uid_t range 0..4294967295 tar: Archive contains `\0\016\0\v\0\t\0' where numeric gid_t value expected \377\374\377\375\377\002 tar: ▒▒▒▒▒: implausibly old time stamp 1970-01-01 07:59:59 tar: Skipping to next header

gzip: stdin: invalid compressed data--format violated tar: Child returned status 1 tar: Error is not recoverable: exiting now

GeekOrangeLuYao commented 8 months ago

Can you provide more details of your downloaded files train.tar.gz-part-{a-f} and your uncompress cmd? Based on the information you've provided so far, I am unable to determine the specific file that is causing the issue.

llearner commented 8 months ago

Can you provide more details of your downloaded files train.tar.gz-part-{a-f} and your uncompress cmd? Based on the information you've provided so far, I am unable to determine the specific file that is causing the issue.

Yes,

  1. cat train.tar.gz-part-* > train.tar.gz
  2. tar -zxvf train.tar.gz train/ train/3D_SPK_00001/ train/3D_SPK_00001/3D_SPK_00001_001_Device01_Distance04_Dialect00.wav ... train/3D_SPK_01197/3D_SPK_01197_003_Device05_Distance13_Dialect00.wav train/3D_SPK_01197/3D_SPK_01197_003_Device06_Distance08_Dialect00.wav train/3D_SPK_01197/3D_SPK_01197_003_Device08_Distance12_Dialect00.wav tar: Skipping to next header tar: Archive contains \0\b\0\004\0\006\0\005\0\004\0' where numeric off_t value expected tar: Archive contains\0\005\0\r\0\004\0\004\0\a\0' where numeric time_t value expected tar: Archive value -1095216594940 is out of uid_t range 0..4294967295 tar: Archive contains `\0\016\0\v\0\t\0' where numeric gid_t value expected \377\374\377\375\377\002 tar: ▒▒▒▒▒: implausibly old time stamp 1970-01-01 07:59:59 tar: Skipping to next header

gzip: stdin: invalid compressed data--format violated tar: Child returned status 1 tar: Error is not recoverable: exiting now

llearner commented 8 months ago

File size / File Name 203207197134 / train.tar.gz 34359738368 / train.tar.gz-part-a 34359738368 / train.tar.gz-part-b 34359738368 / train.tar.gz-part-c 34359738368 / train.tar.gz-part-d 34359738368 / train.tar.gz-part-e 31408505294 / train.tar.gz-part-f and the md5 value of train.tar.gz is 6e774697a07ae332d51049c418eded85

GeekOrangeLuYao commented 8 months ago

We have checked the datasets and run cat train.tar.gz-part-* > train.tar.gz and md5sum train.tar.gz. The md5 value is c2cea55fd22a2b867d295fb35a2d3340 which is the same as the value on our website, but different from your results.

There may have been some errors during the download process. We suggest you can try downloading it again as the md5 value is different.

llearner commented 8 months ago

We have checked the datasets and run cat train.tar.gz-part-* > train.tar.gz and md5sum train.tar.gz. The md5 value is c2cea55fd22a2b867d295fb35a2d3340 which is the same as the value on our website, but different from your results.

There may have been some errors during the download process. We suggest you can try downloading it again as the md5 value is different.

Yes, so in order to avoid re-downloading all the parts, it's important to know which part of the file failed, here is md5 value of our downloaded files: train.tar.gz-part-a: 4109addde41d88760947263f18117ac3 train.tar.gz-part-b: ea569fc26d894f5e0c5e38be2820490f train.tar.gz-part-c: bd2ce08f5b51005b66afe484b01a4a59 train.tar.gz-part-d: 5cd31d961d2d5211aea38b8b95f7239a train.tar.gz-part-e: 58f3fb7d28ae7f4b65ee35a1ed7ab106 train.tar.gz-part-f: be64551c030e8087562a10df2c74ccb1

GeekOrangeLuYao commented 8 months ago

Yes, here're the md5 value of part files:

4109addde41d88760947263f18117ac3  train.tar.gz-part-a
5a17ef2fa28b1b9e340277edffb8b51c  train.tar.gz-part-b
bd2ce08f5b51005b66afe484b01a4a59  train.tar.gz-part-c
5cd31d961d2d5211aea38b8b95f7239a  train.tar.gz-part-d
58f3fb7d28ae7f4b65ee35a1ed7ab106  train.tar.gz-part-e
be64551c030e8087562a10df2c74ccb1  train.tar.gz-part-f

The file train.tar.gz-part-b you downloaded have some problems.

We will update these information in our shell scripts.