shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
357 stars 29 forks source link

kraken2 report to mpa with counts #83

Closed motroy closed 11 months ago

motroy commented 11 months ago

Hi @shenwei356 ,

thank you for this great tool! in your tutorial (https://bioinf.shenwei.me/taxonkit/tutorial/#parsing-krakenbracken-result) you show how to parse kraken2 report files to mpa style output with relative abundances - is it possible to generate the same mpa style output with the actual counts instead of relative abundance? If it is possible, I would really appreciate an example of doing so

thanks again, regards, Yair

shenwei356 commented 11 months ago
$ cat SRS014459-Stool.fasta.gz_bracken_species.kreport \
    | csvtk cut -Ht -f 5,1 \                                             # here change "5,1" to "5,2"
    | taxonkit lineage \
    | taxonkit reformat -i 3 -P -f "{k}|{p}|{c}|{o}|{f}|{g}|{s}" \
    | csvtk cut -Ht -f 4,2 \
    | csvtk replace -Ht -p "(\|[kpcofgs]__)+$" \
    | csvtk replace -Ht -p "\|[kpcofgs]__\|" -r "|" \
    | csvtk uniq -Ht \
    | csvtk grep -Ht -p k__ -v \
    > SRS014459-Stool.fasta.gz_bracken_species.kreport.format
motroy commented 11 months ago

great! thank you for your prompt response @shenwei356 regards, Yair