Closed poursalavati closed 1 year ago
Hi Naser, DIAMOND seems not to recognize the rank of realm
, so you can simply remove the column realm
from the id2lin
file with csvtk:
csvtk cut -t -f -realm id2lin -o id2lin.no-realm
If DIAMOND does not support kingdom
either, you may need to rename it superkingdom
.
Thanks a lot for the suggestion. I've checked the diamond source code and instead of removing the realm, renamed it to superkingdom based on this:
const char* Rank::names[] = {
"no rank", "superkingdom", "kingdom", "subkingdom", "superphylum", "phylum", "subphylum", "superclass", "class", "subclass", "infraclass", "cohort", "subcohort", "superorder",
"order", "suborder", "infraorder", "parvorder", "superfamily", "family", "subfamily", "tribe", "subtribe", "genus", "subgenus", "section", "subsection", "series", "species group",
"species subgroup", "species", "subspecies", "varietas", "forma", "strain", "biotype", "clade", "forma specialis", "genotype", "isolate", "morph", "pathogroup", "serogroup", "serotype", "subvariety"
};
But now there is another issue after making db, almost all sequences have no rank! seems an issue in node file.
Its end of diamond stdout:
Loading taxonomy names... [0.021s]
**Loaded taxonomy names for 0 taxon ids.**
Loading taxonomy mapping file... [355.924s]
Joining accession mapping... [82.928s]
Writing taxon id list... [2.292s]
Building taxonomy nodes... [11.996s]
2147258865 taxonomy nodes processed.
Number of nodes assigned to rank:
**no rank 2147251888**
superkingdom 6
kingdom 10
subkingdom 0
superphylum 0
phylum 17
subphylum 0
superclass 39
class 0
subclass 0
infraclass 0
cohort 0
subcohort 0
superorder 0
order 64
suborder 0
infraorder 0
parvorder 0
superfamily 0
family 206
subfamily 0
tribe 0
subtribe 0
genus 2116
subgenus 0
section 0
subsection 0
series 0
species group 0
species subgroup 0
species 4519
subspecies 0
varietas 0
forma 0
strain 0
biotype 0
clade 0
forma specialis 0
genotype 0
isolate 0
morph 0
pathogroup 0
serogroup 0
serotype 0
subvariety 0
Closing the input file... [0s]
Closing the database file... [0.015s]
Database sequences 220799163
Database letters 49459660621
Accessions in database 220799163
Entries in accession to taxid file 216984561
Database accessions mapped to taxid 0
Database sequences mapped to taxid 0
Database hash 58748cfe915c91e69a43a88c27aa3e8b
Total time 2155s
I already sent the node and name file. I would greatly appreciate your assistance if you have any suggestions or solutions.
I think the point is that I have a large number of sequences for each taxonomy id. And it seems that only one of each has been identified (or indexed) by Diamond. for example:
IMGVR_UViG_638276111_000001|638276111|638297712 541518477
IMGVR_UViG_638276111_000001|638276111|638297713 541518477
IMGVR_UViG_638276111_000001|638276111|638297714 541518477
IMGVR_UViG_638276111_000001|638276111|638297715 541518477
IMGVR_UViG_638276111_000001|638276111|638297716 541518477
IMGVR_UViG_638276111_000001|638276111|638297717 541518477
IMGVR_UViG_638276111_000001|638276111|638297718 541518477
IMGVR_UViG_638276111_000001|638276111|638297719 541518477
IMGVR_UViG_638276111_000001|638276111|638297720 541518477
IMGVR_UViG_638276111_000001|638276111|638297721 541518477
IMGVR_UViG_638276111_000001|638276111|638297722 541518477
IMGVR_UViG_638276111_000001|638276111|638297723 541518477
Hope there is some way to fix it!
Seems you're using an old taxonkit version (<0.14.0), please update to 0.14.1
taxonkit create-taxdump
:
int32
instead of uint32
, as BLAST and DIAMOND do. #70Seems you're using an old taxonkit version (<0.14.0), please update to 0.14.1
taxonkit create-taxdump
:
- save taxIds in
int32
instead ofuint32
, as BLAST and DIAMOND do. #70
Thanks dear Wei,
Im using Version: 0.14.1 and checked int32
.
I will write an update on this so maybe help other similar cases:
I was reading Diamond code and found this part in taxonomy.ccp
, seems Diamond edited the seq title in these conditions, and one matched my seq ids!
string get_accession(const string &title)
{
size_t i;
string t(title);
if (t.compare(0, 6, "UniRef") == 0)
t.erase(0, t.find('_', 0) + 1);
else if ((i = t.find_first_of('|', 0)) != string::npos) {
if (t.compare(0, 3, "gi|") == 0) {
t.erase(0, t.find_first_of('|', i + 1) + 1);
i = t.find_first_of('|', 0);
}
t.erase(0, i + 1);
i = t.find_first_of('|', 0);
if (i != string::npos)
t.erase(i);
}
i = t.find_last_of('.');
if (i != string::npos)
t.erase(i);
return t;
}
So what I did was replace pipes |
to _
in both taxid.map and also fasta headers!
Thanks again for your support and suggestions, Naser
Hello and thanks for developing this great tool,
I am currently in the process of preparing a Diamond database from IMG/VR data. I have already prepared the necessary data and successfully created the name and node files, as well as the taxid.map file.
However, I encountered an error during the diamond makedb process: "Error: Invalid taxonomic rank: realm".
I have attached the name and node files for your reference. The taxid.map file is quite large, but I have included the head of it in the links below:
nodes.dmp.txt names.dmp.txt head_id2tax.txt
For taxonkit step, I have attached the head of the input file in the link below: head_id2lin.txt
Diamond making db:
Thank you in advance for any help you can provide regarding this issue. If you have any suggestions or solutions, I would greatly appreciate your assistance.
Naser,