steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
762 stars 100 forks source link

Foldseek easy-search stops on different errors when using --format-output values of "alntmscore,lddt,prob" #305

Closed vmkhot closed 1 month ago

vmkhot commented 1 month ago

Hi there!

Thanks for this great tool - it is my first time using it (also beginner with protein structures)

I created by own prostT5 database from 345 AA seqs by doing this:

foldseek databases ProstT5 weights tmp
foldseek createdb mixed_cluster_0.fa db --prostt5-model weights

When I run:

foldseek easy-search ../aa_files_mixedclusters/mixed_cluster_0.fa db result.m8 tmp --prostt5-model weights

I get the expected result of a tabular "blast like" file

However, I wanted to more informative parameters on my alignments but they all lead to different errors.

foldseek easy-search ../aa_files_mixedclusters/mixed_cluster_0.fa db mixed_cluster_0_result_6.m8 tmp --prostt5-model weights --format-output "query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qcov,alntmscore,lddt,prob"

image

and if I run:

foldseek easy-search ../aa_files_mixedclusters/mixed_cluster_0.fa db mixed_cluster_0_result_6.m8 tmp --prostt5-model weights --format-output "query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qcov,alntmscore,lddt"

#OR

foldseek easy-search ../aa_files_mixedclusters/mixed_cluster_0.fa db mixed_cluster_0_result_6.m8 tmp --prostt5-model weights --format-output "query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qcov,alntmscore"

image

I'm wondering if it's possible to get tmscores and other structural alignment values for 3Di sequences produced from prostt5/AA sequences instead of protein structures? Or if there's a different cause of this error?

Thanks in advance for your help!

Varada

Additionally, I get the following warning in my log (below) and was wondering how this impacts my results/how to solve it?

Cannot find query C-alpha or db C-alpha database
Disabling --sort-by-structure-bits
This impacts the final score and ranking of hits, but not E-values themselves. Ranking alterations primarily occur for E-values < 10^-1.
milot-mirdita commented 1 month ago

ProstT5 can’t be combined with scores that require structure information, since there is only 3Di tokens and no backbone coordinates.