Open feixiang1209 opened 4 months ago
Probably you've solved it by now, but still. Have you tried using mmseqs nrtotaxmapping
after the createtaxdb? I saw your comment in the other issue and I think nrtotaxmapping is the solution. Not 100% sure though.
Thanks for your reply, I tried to run mmseqs nrtotaxmapping after “mmseqs createtaxdb nrDB tmp --ncbi-tax-dump ./taxdump/ --tax-mapping-file ./prot.accession2taxid” using command "mmseqs nrtotaxmapping accession2taxid/prot.accession2taxid nrDB output.tsv". The output.tsv is as below, should I replace nrDB_mapping with this file?
0 1047168 1 185202 2 412384 3 3072323 4 150340 5 1573704 6 2517205 7 286 8 1307 9 2635419 10 34041 11 2212474 12 1487711 13 1871050
I think so. I'm looking at the code the mmseqs devs linked in the previous issue and it seems that's what their script does. You'll basically create the ${OUTDB}_mapping
file by renaming your tsv.
${MMSEQS}" nrtotaxmapping "${TMP_PATH}/pdb.accession2taxid" "${TMP_PATH}/prot.accession2taxid" "${OUTDB}" "${OUTDB}_mapping" ${THREADS_PAR}
Thanks a lot @AndrazMarinc!
That looks correct! You still have to call the createtaxdb
after you replace the _mapping
file to create the _taxonomy
file that contains all the taxdump information.
Thanks a lot @AndrazMarinc and @milot-mirdita . It worked. Also I found another solution, the file "prot.accession2taxid" download from NCBI needs modification. Only two columns (accession.version and taxid) are needed to run createtaxdb.
Dear mmseq2 team
I am trying to create taxonomy database for ncbi nr data base. First I downloaded nr.fa, taxdump and prot.accession2taxid. Then I ran the below commands
mmseqs createdb nr.fa nrDB
mmseqs createtaxdb nrDB tmp --ncbi-tax-dump ./taxdump/ --tax-mapping-file ./prot.accession2taxid
After a few hours, the run completed without error. However, file nrDB_mapping is empty. Could you please advise where I did wrongly?
Thanks a lot