shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

Possible bug with taxonkit filter --black-list #37

Closed standage closed 3 years ago

standage commented 3 years ago

When user specifies a rank or a (comma separated?) list of ranks for --black-list, these should be excluded from the output, correct? I have tried the following example several times with different ranks, and I get the same error message every time.

$ echo 349741 | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' > taxids2.txt
$ cat taxids2.txt | taxonkit filter -B Family 
23:40:47.905 [ERRO] rank order not defined in rank file: no rank

Is this a bug, or am I misunderstanding this flag?


Prerequisites

Describe your issue

shenwei356 commented 3 years ago

You have to leave the default value in the black list. I should have maken it clear.

cat taxids2.txt | taxonkit filter -B Family -B "no rank,clade"
2
74201
203494
48461
1647988
239934
239935
349741
standage commented 3 years ago

For now, the Python bindings add "no rank" and "clade" to the blacklist automatically, and it works ok.

But if it is required to leave the default values in the blacklist, maybe the --black-list flag should append to the default list instead of replace it?

shenwei356 commented 3 years ago

I read the code again, and fix the logic.

"no rank" and "clade" are already defined as ranks with no order in the ranks.txt, and they can be optional removed via -N--discard-noranks, -B/--blast-list can be used for adding more ranks to delete, it can also include "no rank".

  -B, --black-list strings   black list of ranks to discard, e.g., '"no rank", "clade"'
  -N, --discard-noranks      discard ranks without order, type "taxonkit filter --help" for details

The above command should be:

cat taxids2.txt | taxonkit filter -N -B Family
  1. Flag -L/--lower-than and -H/--higher-than are exclusive, and can be
     used along with -E/--equal-to which values can be different.
  2. A list of pre-ordered ranks is in ~/.taxonkit/ranks.txt, you can use
     your list by -r/--rank-file, the format specification is below.
  3. All ranks in taxonomy database should be defined in rank file.
  4. TaxIDs with no rank can be optionally discarded by -N/--discard-noranks.
  5. Futher ranks can be removed with black list via -B/--black-list.
shenwei356 commented 3 years ago
standage commented 3 years ago

Oh, ok. So if I want to specify -B family then the -N flag is required?

shenwei356 commented 3 years ago

not required, it's optional. -N is just for removing "no rank", "clade".

standage commented 3 years ago

If it's optional, why did my original command fail?

shenwei356 commented 3 years ago

There was a bug, it's fixed now.

https://github.com/shenwei356/taxonkit/commit/72d438b0c814eb68b8aa3d01f428219f6d1cd501#diff-8e2def025044548b1e3afb01de909c09078d4e6e4b84b9efc8c56a73b6434b34L210

standage commented 3 years ago

Oh ok. 🤓

I will test with the latest binary you posted.

standage commented 3 years ago

Is it easy for you to create a Darwin AMD binary? Don't worry if it's inconvenient.

shenwei356 commented 3 years ago

Oh, I wrongly uploaded arm64 binaries...

standage commented 3 years ago

Ok, I understand now. And I confirmed that my original command works. Thank you!