steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
695 stars 92 forks source link

How to get the functional information of queried proteins? #136

Open daisykuma22 opened 1 year ago

daisykuma22 commented 1 year ago

Expected Behavior

Hi, Thanks for developing this great tool for structural alignment. I got the results by searching my PDB files against PDB100 database with the following code foldseek easy-search test/ /hwfssz3/PS_TIO/foldseek_db/pdb100/pdb pdb100_result.m8 . But i still have no idea about their function such as EC accession. And how could i get the information of these queried proteins? I noticed that there are detailed descriptions about the hits when running web server.

Plus, i have downloaded all Alphafold and PDB database from https://foldseek.steineggerlab.workers.dev/: afdb50.tar.gz (87G), afdb_swissprot.tar.gz (1.3G), afdb_proteome.tar.gz (1.6G), pdb100.tar.gz (1.7G), afdb.tar.gz (423G). Is the tar file of afdb.tar.gz (423G) contained all the others?

Current Behavior

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

Foldssek Output (for bugs)

Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

martin-steinegger commented 1 year ago

We currently do not store any annotations per database. To gather this information, you would need to download the meta-data for each database.

Having the annotations might be a great addition to Foldseek, though. I will label this issue as enhancement.

tnbioinfo commented 4 months ago

Hi Dr. Steinegger

First, thank you and everyone else working on this project for creating such a useful and powerful tool to predict protein function. I am also interested in having the functional and taxa annotation for each PDB/AFDB hit in the local foldseek results. Could you explain how to append the meta-data to the databases so that they show up in the output?

Thanks in advance! TN

milot-mirdita commented 4 months ago

To add the taxonomy information to the result you can pass additional columns to the --format-output parameter.

The default columns are query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits. You can add taxid,taxname,taxlineage for additional taxonomy information: --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,taxid,taxname,taxlineage

This parameters works for easy-search or if you are calling individual modules for convertalis. Additonally, it only works for the prebuilt databases where we include taxonomy information (e.g. PDB and AFDB).

Please make a new issue If you have any follow up questions and do not respond to this issue.

TigerWindWood commented 3 months ago

Thank you for developing such an excellent tool, but I'm still unsure how to incorporate functional descriptions into the output results via meta-data, similar to what's seen on the web server. Does the latest version of Foldseek support this information output in tabular format?

alexcorm commented 1 week ago

Is there a protocol or information for retrieving the metadata and linking the results to the functional annotation? Being able to retrieve annotations would be a great help!