Open ahof1704 opened 7 months ago
Hi,
I have tried following the step described on the wiki to create the mapping for NR.
I have download and ensured I have the taxonomy folder
ls -lh /root/mmseqs2_db/taxonomy/
Permissions Size User Date Modified Name
drwxr-sr-x - root 25 Mar 13:35 .ipynb_checkpoints/
.rw-rw-r-- 20M 9019 12 Mar 21:27 citations.dmp
.rw-rw-r-- 4.7M 9019 12 Mar 21:25 delnodes.dmp
.rw-rw-r-- 452 9019 12 Mar 21:20 division.dmp
.rw-rw-r-- 16k 9019 12 Mar 21:27 gc.prt
.rw-rw-r-- 4.9k 9019 12 Mar 21:20 gencode.dmp
.rw-rw-r-- 3.9M 9019 12 Mar 21:25 images.dmp
.rw-rw-r-- 1.4M 9019 12 Mar 21:25 merged.dmp
.rw-rw-r-- 244M 9019 12 Mar 21:27 names.dmp
.rw-rw-r-- 194M 9019 12 Mar 21:27 nodes.dmp
.rw-rw---- 3.1k 4544 27 Apr 2023 readme.txt
.rw-rw-r-- 65M root 12 Mar 21:28 taxdump.tar.gz
But when attempting to extract the fasta and the tax id mapping, I get the following error:
cd /root/mmseqs2_db
blastdbcmd -db nr -entry all > nr.fna
BLAST Database error: No alias or index file found for nucleotide database [nr] in search path [/root/mmseqs2_db::]
I have ensured that the files for nr are available in that path
ls -lh /root/mmseqs2_db/nr*
Permissions Size User Date Modified Name
.rw-rw-r-- 13G root 15 Mar 16:54 /root/mmseqs2_db/nr
.rw-rw-r-- 4 root 15 Mar 16:54 /root/mmseqs2_db/nr.dbtype
.rw-r--r-- 0 root 25 Mar 13:46 /root/mmseqs2_db/nr.fna
.rw-rw-r-- 779M root 15 Mar 16:54 /root/mmseqs2_db/nr.index
.rw-rw-r-- 790M root 15 Mar 16:55 /root/mmseqs2_db/nr.lookup
.rw-rw-r-- 8 root 15 Mar 16:52 /root/mmseqs2_db/nr.source
.rw-rw-r-- 11 root 15 Mar 17:03 /root/mmseqs2_db/nr.version
.rw-rw-r-- 4.0G root 15 Mar 16:52 /root/mmseqs2_db/nr_h
.rw-rw-r-- 4 root 15 Mar 16:52 /root/mmseqs2_db/nr_h.dbtype
.rw-rw-r-- 748M root 15 Mar 16:55 /root/mmseqs2_db/nr_h.index
.rw-rw-r-- 0 root 15 Mar 16:55 /root/mmseqs2_db/nr_mapping
.rw-rw-r-- 708M root 15 Mar 16:55 /root/mmseqs2_db/nr_taxonomy
nr.fna is still empty. Not sure if this is a required step in order to create the nr_mapping. I would appreciate any help in getting the tax info for the NR dataset.
Thanks!
Hi, I would really appreciate some help with this. Thanks!
I remember blastdbcmd
having issues, however, I don't remember what was wrong.
We use a different workflow to assign taxids for the NR: https://github.com/soedinglab/MMseqs2/blob/804bb2af6d1be4086252c46bf15f3c75a5d9e931/data/workflow/databases.sh#L419
The download part for the accession2taxid
files:
https://github.com/soedinglab/MMseqs2/blob/804bb2af6d1be4086252c46bf15f3c75a5d9e931/data/workflow/databases.sh#L120
Sorry, not sure if I follow. Am I supposed to do any differently from what I described above to filter NR by taxonomy info?
I had same issue, and posted here as well. Also got no reply.
Sorry, not sure if I follow. Am I supposed to do any differently from what I described above to filter NR by taxonomy info?
I think that what they are saying is that one should download the accession2taxid
files from NCBI. Then you can somehow run mmseqs nrtotaxmapping
to create the mapping.
Maybe related to this issue: I'm having a similar issue with creating a NT taxonomy db so I found this thread. I think that the createtaxdb does not complete in my case despite not printing any errors/warnings. It just eats a lot of RAM and then seemingly finishes, however, the nt_mapping is empty.
EDIT: I solved my problem with nt_mapping being empty. The problem was that I was using the accession
column instead of the accession.version
column from the nucl_gb.accession2taxid
file I had downloaded from NCBI as input for mmseqs createtaxdb
. mmseqs then produced an empty mapping since it tried matching accession
numbers I provided with accession.version
obtained from the nt database and of course couldn't find any matches to make the taxid mapping.
Hello everyone, I have one question related to the MMseq database. I used this command: $ mmseqs databases NR NRdb tmpDir to custom NRdb. I left the computer run overnight. In the morning, the terminal has been off. I check the result folder there are several files such as (mmseqDB, mmseqDB.dbtype, etc). I attempt to open the mmseqDB to check only Chinese characters there instead of Sequences. Does anyone faced the same issue? I guess that is an error because I can't run further analysis with that DB. Many thanks!
I would like to run the seq similarity search against the homo sapien samples in the NR dataset. For that I downloaded the dataset as follows:
mmseqs databases NR nr tmp
Then I attempted to filter for the tax I want
mmseqs filtertaxseqdb nr nr_human --taxon-list 9606
However, I am getting the following error:
nr_mapping is empty. Rerun createtaxdb to recreate taxonomy mapping.
It is unclear what this suggestion to recreate the taxonomy means or what this command should look like. I would appreciate any help with that.
Thanks Antonio