Open daisykuma22 opened 1 year ago
We currently do not store any annotations per database. To gather this information, you would need to download the meta-data for each database.
Having the annotations might be a great addition to Foldseek, though. I will label this issue as enhancement.
Hi Dr. Steinegger
First, thank you and everyone else working on this project for creating such a useful and powerful tool to predict protein function. I am also interested in having the functional and taxa annotation for each PDB/AFDB hit in the local foldseek results. Could you explain how to append the meta-data to the databases so that they show up in the output?
Thanks in advance! TN
To add the taxonomy information to the result you can pass additional columns to the --format-output
parameter.
The default columns are query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
.
You can add taxid,taxname,taxlineage
for additional taxonomy information:
--format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,taxid,taxname,taxlineage
This parameters works for easy-search
or if you are calling individual modules for convertalis
. Additonally, it only works for the prebuilt databases where we include taxonomy information (e.g. PDB and AFDB).
Please make a new issue If you have any follow up questions and do not respond to this issue.
Thank you for developing such an excellent tool, but I'm still unsure how to incorporate functional descriptions into the output results via meta-data, similar to what's seen on the web server. Does the latest version of Foldseek support this information output in tabular format?
Is there a protocol or information for retrieving the metadata and linking the results to the functional annotation? Being able to retrieve annotations would be a great help!
Expected Behavior
Hi, Thanks for developing this great tool for structural alignment. I got the results by searching my PDB files against PDB100 database with the following code
foldseek easy-search test/ /hwfssz3/PS_TIO/foldseek_db/pdb100/pdb pdb100_result.m8
. But i still have no idea about their function such as EC accession. And how could i get the information of these queried proteins? I noticed that there are detailed descriptions about the hits when running web server.Plus, i have downloaded all Alphafold and PDB database from https://foldseek.steineggerlab.workers.dev/: afdb50.tar.gz (87G), afdb_swissprot.tar.gz (1.3G), afdb_proteome.tar.gz (1.6G), pdb100.tar.gz (1.7G), afdb.tar.gz (423G). Is the tar file of afdb.tar.gz (423G) contained all the others?
Current Behavior
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Foldssek Output (for bugs)
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.