steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

Error using PDB and CATH50 downloaded databases #294

Open Fede112 opened 5 days ago

Fede112 commented 5 days ago

Expected Behavior

Hi, I am trying to use the CATH50 and PDB databases downloaded using the foldseek databases module to perform a straightforward search+convertalis with a self-made database containing around 30,000 queries.

Current Behavior

The foldseek convertalis command fails with the following error: free(): invalid next size (normal)

I managed to run the search by using cath50_seq instead of cath50. I did this because I noticed a problem in the indexing, but I am not sure if this is the correct approach. Is this okay?

Your Environment

Foldseek Version: d326ca0212e48b2dfced1c1d9cd0d05b80b159b6 Also encountered the problem with version: foldseek/8-ef4e960

Thanks in advance

milot-mirdita commented 4 days ago

Could you please post the full terminal output of foldseek?

Proceeding with cath50_seq is fine, this database contains all structures. the cath50 contains only the representative structures, which can be expanded to the cluster members with --cluster-search 1.

But it shouldn't crash in either case.

Fede112 commented 4 days ago

Hi,

I conducted further tests, and here are the outputs for the PDB database. They are essentially the same as the CATH50 errors. For comparison, I tested this on a node with Intel processors and a node with ARM processors, and I noticed that the errors are different between the two. I built Foldseek for Intel, so I am not too worried about ARM not working. Strangely enough, both work with pdb_seq.

log_arm_stderr.txt log_arm_stdout.txt log_intel_stderr.txt log_intel_stdout.txt

milot-mirdita commented 4 days ago

Does it work correctly when you turn around the roles of query and target (i.e. make pdb the target db)

Fede112 commented 3 days ago

yes, if I turn around the roles everything works just fine for either intel or arm.

milot-mirdita commented 3 days ago

We intended for all of these databases from foldseek databases to be used as target databases. I guess we have a bug when using them on the query side.