steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
696 stars 92 forks source link

afdb50clusearch.tar.gz is corrupted? #176

Closed YoshitakaMo closed 10 months ago

YoshitakaMo commented 10 months ago

Current Behavior

I'm installing Foldseek for local PCs or supercomputers with the afdb50 database. I installed the latest version of Foldseek (Aug 29, 2023) and set up the database with foldseek databases Alphafold/UniProt50 afdb50 tmp, but I got an error message:

$ foldseek databases Alphafold/UniProt50 afdb50 tmp
databases Alphafold/UniProt50 afdb50 tmp

MMseqs Version:                 96be67cfedf1491b3280c169714eabf207dbf796
Tsv                             false
Force restart with latest tmp   false
Remove temporary files          false
Compressed                      0
Threads                         32
Verbosity                       3

afdb50_ca
afdb50_ca.dbtype
afdb50_ca.index
afdb50_h
afdb50_h.dbtype
afdb50_h.index
afdb50
afdb50.dbtype
afdb50.index
afdb50.lookup
afdb50_mapping
afdb50_ss
afdb50_ss.dbtype
afdb50_ss.index
afdb50_taxonomy
afdb50_clu
afdb50_clu.dbtype
afdb50_clu.index
afdb50_seq.0
afdb50_seq.1
afdb50_seq_ca.0
afdb50_seq_ca.1

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Finally, I couldn't install afdb50 database. To investigate this further, I tried downloading the file directly with aria2c command, but no error occurred.

$ aria2c https://foldseek.steineggerlab.workers.dev/afdb50clusearch.tar.gz

The file sizes:

-rw-r--r-- 1 root root  94325688210 Aug 31 18:45 afdb50.tar.gz
-rw-r--r-- 1 root root 332518133806 Sep  1 01:57 afdb50clusearch.tar.gz

$ sha256sum afdb50.tar.gz
a6ac9762afbe2c5aba6ee03bc4b1177ece06e3000ea72345f2a616975520e87a  afdb50.tar.gz
$ sha256sum afdb50clusearch.tar.gz
fe942f0a3473f17426401a88aee90e137d25d5f02184831a02a8bcbba0fc25fc  afdb50clusearch.tar.gz

Expected Behavior

foldseek databases Alphafold/UniProt50 afdb50 tmp will set up afdb50 database for Foldseek search.

Your Environment

YoshitakaMo commented 10 months ago

Sorry, the problem appeared to be caused by an issue with the download on my computer. It appeared to automatically switch to wget when the download using aria2c failed. However, if I manually switched to downloading using aria2c, the file would become corrupted. The sha256 checksum, when correctly decompressed, is

$ sha256sum afdb50clusearch.tar.gz
- fe942f0a3473f17426401a88aee90e137d25d5f02184831a02a8bcbba0fc25fc  afdb50clusearch.tar.gz
+ 4fc2b1586583d2462eefc171ef7ae91e6f4de35fbbbb00fc04013b981eb4dec2  afdb50clusearch.tar.gz

I placed it with afdb50.tar.gz and version files in the tmp directory, then re-entered the command. This time, the database was successfully created.

# foldseek databases Alphafold/UniProt50 afdb50 tmp
databases Alphafold/UniProt50 afdb50 tmp

MMseqs Version:                 96be67cfedf1491b3280c169714eabf207dbf796
Tsv                             false
Force restart with latest tmp   false
Remove temporary files          false
Compressed                      0
Threads                         32
Verbosity                       3

afdb50_ca
afdb50_ca.dbtype
afdb50_ca.index
afdb50_h
afdb50_h.dbtype
afdb50_h.index
afdb50
afdb50.dbtype
afdb50.index
afdb50.lookup
afdb50_mapping
afdb50_ss
afdb50_ss.dbtype
afdb50_ss.index
afdb50_taxonomy
afdb50_clu
afdb50_clu.dbtype
afdb50_clu.index
afdb50_seq.0
afdb50_seq.1
afdb50_seq_ca.0
afdb50_seq_ca.1
afdb50_seq_ca.dbtype
afdb50_seq_ca.index
afdb50_seq.dbtype
afdb50_seq_h.0
afdb50_seq_h.1
afdb50_seq_h.dbtype
afdb50_seq_h.index
afdb50_seq.index
afdb50_seq.lookup
afdb50_seq_mapping
afdb50_seq_ss.0
afdb50_seq_ss.1
afdb50_seq_ss.dbtype
afdb50_seq_ss.index
afdb50_seq_taxonomy
mvdb tmp/560470045461889808/afdb50_seq afdb50_seq

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50 afdb50

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50_seq_ss afdb50_seq_ss

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50_ss afdb50_ss

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50_seq_h afdb50_seq_h

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50_h afdb50_h

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50_seq_ca afdb50_seq_ca

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50_ca afdb50_ca

Time for processing: 0h 0m 0s 0ms
mvdb tmp/560470045461889808/afdb50_clu afdb50_clu
-rw-r--r-- 1 root root  14187874580 Aug 18 14:47 afdb50
-rw-r--r-- 1 root root   1315395424 Aug 18 14:48 afdb50.index
-rw-r--r-- 1 root root            4 Aug 18 14:48 afdb50.dbtype
-rw-r--r-- 1 root root  55242286017 Aug 18 14:49 afdb50_seq.1
-rw-r--r-- 1 root root            4 Aug 18 14:51 afdb50_seq.dbtype
-rw-r--r-- 1 root root   5410026102 Aug 18 14:51 afdb50_seq.index
lrwxrwxrwx 1 root root            6 Aug 18 14:51 afdb50_seq.0 -> afdb50
-rw-r--r-- 1 root root   2956810604 Aug 18 14:51 afdb50_h
-rw-r--r-- 1 root root   1240528477 Aug 18 14:52 afdb50_h.index
-rw-r--r-- 1 root root            4 Aug 18 14:52 afdb50_h.dbtype
-rw-r--r-- 1 root root   9292316169 Aug 18 14:52 afdb50_seq_h.1
-rw-r--r-- 1 root root            4 Aug 18 14:53 afdb50_seq_h.dbtype
-rw-r--r-- 1 root root   5061775645 Aug 18 14:54 afdb50_seq_h.index
lrwxrwxrwx 1 root root            8 Aug 18 14:54 afdb50_seq_h.0 -> afdb50_h
-rw-r--r-- 1 root root  14187874580 Aug 18 14:56 afdb50_ss
-rw-r--r-- 1 root root   1315386104 Aug 18 14:57 afdb50_ss.index
-rw-r--r-- 1 root root            4 Aug 18 14:57 afdb50_ss.dbtype
-rw-r--r-- 1 root root  55242286017 Aug 18 14:58 afdb50_seq_ss.1
-rw-r--r-- 1 root root            4 Aug 18 14:59 afdb50_seq_ss.dbtype
-rw-r--r-- 1 root root   5410016782 Aug 18 15:00 afdb50_seq_ss.index
lrwxrwxrwx 1 root root            9 Aug 18 15:00 afdb50_seq_ss.0 -> afdb50_ss
-rw-r--r-- 1 root root  84912584040 Aug 18 15:14 afdb50_ca
-rw-r--r-- 1 root root   1391679690 Aug 18 15:14 afdb50_ca.index
-rw-r--r-- 1 root root            4 Aug 18 15:14 afdb50_ca.dbtype
-rw-r--r-- 1 root root 330809644226 Aug 18 15:22 afdb50_seq_ca.1
-rw-r--r-- 1 root root            4 Aug 18 15:24 afdb50_seq_ca.dbtype
-rw-r--r-- 1 root root   5775608431 Aug 18 15:24 afdb50_seq_ca.index
lrwxrwxrwx 1 root root            9 Aug 18 15:24 afdb50_seq_ca.0 -> afdb50_ca
-rw-r--r-- 1 root root   2089393040 Aug 18 15:24 afdb50_clu
-rw-r--r-- 1 root root   1234143489 Aug 18 15:24 afdb50_clu.index
-rw-r--r-- 1 root root            4 Aug 18 15:24 afdb50_clu.dbtype
-rw-r--r-- 1 root root   7943602272 Aug 18 15:25 afdb50.lookup
-rw-r--r-- 1 root root   1717470637 Aug 18 15:25 afdb50_mapping
-rw-r--r-- 1 root root    683101917 Aug 18 15:25 afdb50_taxonomy