Closed MajoroMask closed 6 months ago
This is a deliberate design choice. The idea is that the number of ranks and taxa in a lineage should match, so that you can create what phyloseq calls a taxtable, where the rank is the column header and each row corresponds to a given lineage.
Do you truly need all taxa in the lineage for a specific purpose or were you simply perplexed by their absence?
This issue needs information or other forms of answers from you. Please feel free to re-open the issue when you are able to provide them.
Is there an existing issue for this?
Problem description
Hi guys. I'm running
taxpasta
with--add-lineage --add-id-lineage --add-rank-lineage
options on and found the result maybe bugged.As far as I know, if any of
--add-lineage
,--add-id-lineage
or--add-rank-lineage
is given during the call oftaxpasta
, the corresponding columnslineage
,id_lineage
andrank_lineage
should contain the information from columnsname
,taxonomy_id
andrank
, respectively. Shown as below (part of my example data):But if one entry's rank is
no rank
(including root with taxonomy ID of 0), it will be missing in its children taxon's record. See howunclassified Caudoviricetes
(2788787) is missing in the entry ofBifidobacterium phage BD811P2
(2968613, the last line) in the following example:If searched in NCBI Taxonomy, you can see that
unclassified Caudoviricetes
is part ofBifidobacterium phage BD811P2
's full lineage:So I'm pretty sure this is a bug.
Code sample
Code run:
Traceback:
Environment
Anything else?
Input files I'm using:
x.kraken2.report.txt y.kraken2.report.txt