steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
842 stars 104 forks source link

Cannot create databases #9

Closed jabard89 closed 1 year ago

jabard89 commented 3 years ago

Expected Behavior

foldseek databases PDB pdb tmp should setup PDB database

Current Behavior

Returns: gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now

Downloaded pdb.tar.gz is empty. It looks like the target URL (http://wwwuser.gwdg.de/~compbiol/foldseek/) no longer has uploaded databases.

Your Environment

martin-steinegger commented 3 years ago

@jabard89 we updated the alphabet size of foldseek from 16 to 21. So the old database is not compatible anymore. Therefore I took it down.. We are currently recreating the database. I will let you know once the database is online. But in order to use it you need to update foldseek.

martin-steinegger commented 2 years ago

We reuploaded all databases. Does this work now?

Geraldene commented 2 years ago

@martin-steinegger would you be able to provide more information on how I can create the targetdb ? I have a directory that contains a set of protein structures I predicted using Alphafold2 and would like to use these structures to query against the PDB database.

martin-steinegger commented 2 years ago

@Geraldene the following command should work.

foldseek easy-search queryFolder pdb aln tmp
kthurimella commented 2 years ago

I ran into certificate issues while trying to download, any way to bypass them?:

$ foldseek databases PDB pdb tmp 
databases PDB pdb tmp 

MMseqs Version:                 1.3c64211
Force restart with latest tmp   false
Remove temporary files          false
Compressed                      0
Threads                         12
Verbosity                       3

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
--2022-07-01 15:13:54--  https://wwwuser.gwdg.de/~compbiol/foldseek/pdb.tar.gz
Resolving wwwuser.gwdg.de (wwwuser.gwdg.de)... 134.76.10.111
Connecting to wwwuser.gwdg.de (wwwuser.gwdg.de)|134.76.10.111|:443... connected.
ERROR: cannot verify wwwuser.gwdg.de's certificate, issued by ‘CN=Sectigo RSA Organization Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater Manchester,C=GB’:
  Unable to locally verify the issuer's authority.
To connect to wwwuser.gwdg.de insecurely, use `--no-check-certificate'.
Error: Could not download https://wwwuser.gwdg.de/~compbiol/foldseek/pdb.tar.gz to tmp/14286354622525620261/pdb.tar.gz
martin-steinegger commented 2 years ago

We switched the hoster to cloudflare. if you update foldseek it should download it from the new source. I hope it resolves it.

kthurimella commented 2 years ago

Thanks for the fast response. I checked out the repo and compiled from the source this time. Now running into this error:

`foldseek databases PDB pdb new_tmp databases PDB pdb new_tmp

MMseqs Version: 5285cd11c335e1a0133ffd3e32f55ad6ff82f3cb Force restart with latest tmp false Remove temporary files false Compressed 0 Threads 12 Verbosity 3

mv: cannot stat 'new_tmp/5610811273439075906/version': No such file or directory`

I initially started with the same tmp folder and then made a new one and I'm wondering if there's a cache that I can clear/force download?

However, the AF databases seem to be downloading!

martin-steinegger commented 1 year ago

Does this still persist?

alexmyczko commented 1 year ago

I've created a Debian source package, that can be built using "simple sid backport" to most Ubuntu versions or Debian: http://sid.ethz.ch/debian/foldseek/

Unfortunately discussion is not activated, but I was wondering if it would make sense to have this as an official package?

Here's my output of OP command:

$ foldseek databases PDB pdb tmp
Create directory tmp
databases PDB pdb tmp 

MMseqs Version:                 GITDIR-NOTFOUND
Tsv                             false
Force restart with latest tmp   false
Remove temporary files          false
Compressed                      0
Threads                         16
Verbosity                       3

04/04 10:50:55 [NOTICE] Downloading 1 item(s)
[#63483e 806MiB/872MiB(92%) CN:5 DL:74MiB]                                                          
04/04 10:51:14 [NOTICE] Download complete: tmp/1124933551536758242/pdb.tar.gz

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
63483e|OK  |    64MiB/s|tmp/1124933551536758242/pdb.tar.gz

Status Legend:
(OK):download completed.

04/04 10:51:14 [NOTICE] Downloading 1 item(s)

04/04 10:51:15 [NOTICE] Download complete: tmp/1124933551536758242/version

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
72ca56|OK  |   1.8KiB/s|tmp/1124933551536758242/version

Status Legend:
(OK):download completed.
pdb
pdb_ca
pdb_ca.dbtype
pdb_ca.index
pdb_h
pdb_h.dbtype
pdb_h.index
pdb_mapping
pdb_ss
pdb_ss.dbtype
pdb_ss.index
pdb_taxonomy
pdb.dbtype
pdb.index
pdb.lookup
pdb.md5sum
mvdb tmp/1124933551536758242/pdb pdb 

Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_ss pdb_ss 

Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_h pdb_h 

Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_ca pdb_ca 

Time for processing: 0h 0m 0s 0ms
milot-mirdita commented 1 year ago

Please make a new issue for this. If you want to make a Debian package, I would recommend to refer the the MMseqs2 debian package: https://salsa.debian.org/med-team/mmseqs2

The maintainers have done a lot of work to make MMseqs2 play well with Debian and a Debian package for Foldseek should be very similar to the MMseqs2 one (just please don't try to separate Foldseek from it's internal MMseqs2 dependency).