Open cvigilv opened 2 months ago
The easiest workaround for this is probably to use slightly abuse addtaxonomy
:
mmseqs databases UniProtKB/Swiss-Prot sprot tmp
MMSEQS_FORCE_MERGE=1 mmseqs addtaxonomy sprot sprot_h out
tr -d '\000' out > sprot_headers_with_taxonomy.tsv
Adding a module that exports the nodes/names taxonomy dmp files, would also be possible, but that would need to come from an external contribution as I don't have time to implement this currently.
That also works the same way for foldseek, just use a Foldseek database and the foldseek binary instead of mmseqs.
Thanks! Will give it a test and come back if I encounter any problem
Hi @cvigilv , could I check if this worked for you? I am trying to follow the process but am getting an unparseable output file. My foldseek version is the binary from ~2 weeks ago.
foldseek databases UniProtKB/Swiss-Prot sprot tmp
FOLDSEEK_FORCE_MERGE=1 ../foldseek/bin/foldseek addtaxonomy sprot sprot_h out
Output:
addtaxonomy sprot sprot_h out
MMseqs Version: 62a2558bcad0d78976f6275b896afcd7a38136a9
Column with taxonomic lineage 0
LCA ranks
Extract mode 2
Compressed 0
Threads 128
Verbosity 3
[=================================================================] 100.00% 542.38K 1s 513ms
Taxonomy for 542378 entries not found and 0 are deleted
Time for merging to out: 0h 0m 1s 330ms
Time for processing: 0h 0m 4s 344ms
Trying to parse the 'out' file with the tr method doesn't work and I get a warning that the out file is a binary if looking at it with less and it looks uniformly malformed, nothing to deliminate or see. I've tried setting the force_merge to either mmseqs or foldseek and also tried exporting it.
I am able to get the mmseqs version working, I just can't seem to get the FoldSeek one working?
I'm currently trying to use foldseek to prepare some datasets and I would like to check if the taxonomic information of Alphafold/Proteome matches the one I obtained from the FTP server of Alphafold.
Is there any way to convert the binary
_taxonomy
file into a tab-separated value?Expected Behavior
Current Behavior
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Foldssek Output (for bugs)
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.