shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

Unsure about usage of --pseudo-strain #40

Closed standage closed 3 years ago

standage commented 3 years ago

Consider the following two commands.

$ echo 36827 | taxonkit lineage | taxonkit reformat --add-prefix --format '{k};{p};{c};{o};{f};{g};{s};{T}'
36827   cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium;Clostridium botulinum;Clostridium botulinum B        k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium botulinum;T__
$ echo 36827 | taxonkit lineage | taxonkit reformat --add-prefix --format '{k};{p};{c};{o};{f};{g};{s};{T}' --pseudo-strain
36827   cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium;Clostridium botulinum;Clostridium botulinum B        k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium botulinum;T__

The lowest taxon in this lineage is below species and has no rank, so I expected the --pseudo-strain to report the name of this taxon as the strain. It doesn't work with {T} or {t}. Am I doing something wrong?


Prerequisites

Describe your issue

shenwei356 commented 3 years ago

missing -F :)

shenwei356 commented 3 years ago

I improve the flag checking.

if `{t}/{S}/{T}` found in `--format` {
    if `-S/--pseudo-strain` is on, but `-F/--fill-miss-rank` not given {
        log.info()
        turn on `-F`
    }  
} else if `-S/--pseudo-strain` is on {
    log.warning()
}

Example:

$ echo 36827 \
        | taxonkit lineage \
        | taxonkit reformat --add-prefix --format '{g};{s};{t}'  --pseudo-strain --fill-miss-rank \
        | cut -f 1,3
36827   g__Clostridium;s__Clostridium botulinum;t__Clostridium botulinum B

$ echo 36827 \
    | taxonkit lineage \
    | taxonkit reformat --add-prefix --format '{g};{s};{t}'  --pseudo-strain \
    | cut -f 1,3
08:36:01.750 [INFO] -F/--fill-miss-rank is switched on when giving flag -S/--pseudo-strain
36827   g__Clostridium;s__Clostridium botulinum;t__Clostridium botulinum B

$ echo 36827 \
    | taxonkit lineage \
    | taxonkit reformat --add-prefix --format '{g};{s}'  --pseudo-strain \
    | cut -f 1,3
08:35:01.515 [WARN] flag -S/--pseudo-strain will not work because none of "{t}", "{S}", "{T}" is found in -f/--format
36827   g__Clostridium;s__Clostridium botulinum
standage commented 3 years ago

Thanks.