Open alvanuffelen opened 2 months ago
Thanks for reporting this. It's fixed.
Thank you!
Would it also be possible to implement the -n
feature in combination with -H
?
echo 2605619 | taxonkit filter -H genus
Above line prints the taxid because it has no rank.
I would like to do
echo 2605619 | taxonkit filter -H genus -n
such that the taxid gets filtered out (not printed) because the closest higher node is 'species' which is still lower than genus.
Additionally, the help page could me more clear:
-n, --save-predictable-norank do not discard some special ranks without order when using -L, where rank of the closest higher node is still lower than rank cutoff
The taxid is not only discarded 'when the rank of the closest higher node is lower than rank cutoff' but also when the rank is equal.
E.g., : echo 2605619 | taxonkit filter -L species
This gets printed because the closest higher rank is 'species' which is equal to the cutoff.
echo 2605619 | taxonkit filter -H genus -N
does filter out the taxid.
You're right. I'll update the doc.
-n, --save-predictable-norank do not discard some special ranks without order when using -L, where rank of the closest higher node is equal to or lower than the rank cutoff
Indeed, -N
will discard all ranks without order.
But let's say I have the taxids 93506 (higher rank than genus) and 2605619 (lower rank than genus), both no rank.
There is no way to only retain the taxid with a higher rank than genus.
echo -e "93506\n2605619" | taxonkit filter -H genus -N
will remove both.
echo -e "93506\n2605619" | taxonkit filter -H genus
will retain both.
It would be useful to have something like:
echo -e "93506\n2605619" | taxonkit filter -H genus -n
which will remove 2605619 but keep 93506 .
Oh, I remember now. I've considered this before but did not implement it because they are different for -L
and -H
.
I understand what you mean. But I think we should add another flag --discard-predictable-norank
, which only discards these no-ranks (2605619) that can not be higher than the threshold.
--discard-predictable-norank
should be incompatible with -N
and -n
.
-N, --discard-noranks discard all ranks without order, type "taxonkit filter --help" for details
-n, --save-predictable-norank do not discard some special ranks without order when using -L, where
rank of the closest higher node is still lower than rank cutoff
-Z, --discard-predictable-norank
echo -e "93506\n2605619" | taxonkit filter -H genus -Z
93506
Prerequisites
taxonkit version
Describe your issue
In the documentation, it mentions:
This means
!no rank
and!clade
are defined as rank without order.The documentation also states:
Following Taxid has rank
clade
:1783270 cellular organisms;Bacteria;FCB group FCB group clade
As expected, the taxid is not filtered out with following command:
echo 1783270 | ./taxonkit filter -L species
However, why is it filtered out when using
-H
?echo 1783270 | ./taxonkit filter -H species
Based on point 5 in the documentation, TaxIDs with no rank are kept by default, so I would expect them to be kept with both
-L
and-H