Closed dd2252 closed 8 months ago
could you try using cat
to merge the splits into one file, and then extract that? For example, cat train.tar.gz.partaa train.tar.gz.partab train.tar.gz.partac train.tar.gz.partad > train.tar.gz
, and then you can extract train.tar.gz
.
That worked, thank you!!
Hello, I've tried several times to download train data, which is split into 4 tar files. When I extract partaa via tar -xf things seem to go as expected, but all other parts (partab-ad) result in some variant of the following error messages:
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Archive contains ‘\nFh\314A[V\224\373F@>’ where numeric off_t value expected
tar: Archive contains ‘yB\202\215\036\254\270g\365P\251\260’ where numeric off_t value expected
tar: Archive contains ‘ObF\024`\216\203=ja\024\252’ where numeric off_t value expected
tar: Archive contains ‘5\323Cr\035q\331<sާ\250’ where numeric off_t value expected tar: Archive contains ‘\336O*z\226\274\035\313HQ\272\253’ where numeric off_t value expected
tar: Archive contains ‘\0\221ۈ\203r?t\260[\225\022’ where numeric off_t value expected
tar: Archive contains ‘h\306go\030\262(ܢ\213o\304’ where numeric off_t value expected
tar: Archive contains ‘\250\3541W\r\021\242\370֨\356b’ where numeric off_t value expected
tar: Archive contains ‘\243q\346\017S\316M\a=\234N,’ where numeric off_t value expected
tar: Exiting with failure status due to previous errors
Sorry if I'm missing something silly, but I'd appreciate if someone could look into this! I tried downloading via Google Drive and Huggingface and faced the same issue in both cases. Note I also successfully downloaded val and test sets.