shenwei356 / kmcp

Accurate metagenomic profiling && Fast large-scale sequence/genome searching
https://bioinf.shenwei.me/kmcp
MIT License
182 stars 13 forks source link

Read-level output format #3

Closed sjaenick closed 2 years ago

sjaenick commented 2 years ago

Hi,

this looks pretty interesting - would it be possible to implement an additional output format that provides read-level assignments instead of the summary tax. profiles? I.e. maybe something similar to the default Kraken/Kraken2 output format described at

https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format

, with either taxids (for NCBI) or maybe complete lineages for other taxonomies (GTDB etc)?

shenwei356 commented 2 years ago

It did, check the kmcp profile usage

-B, --binning-result string         ► Save extra binning result in CAMI report.

And example output:

# This is the bioboxes.org binning output format at
# https://github.com/bioboxes/rfc/tree/master/data-format
@Version:0.10.0
@SampleID:
@@SEQUENCEID    TAXID
NC_000913.3_sliding:1244941-1245090     511145
NC_013654.1_sliding:344871-345020       562
NC_000913.3_sliding:3801041-3801190     511145
NC_013654.1_sliding:752751-752900       562
NC_000913.3_sliding:4080871-4081020     562
NC_000913.3_sliding:3588091-3588240     511145
NC_000913.3_sliding:2249621-2249770     562
NC_013654.1_sliding:2080171-2080320     431946
NC_000913.3_sliding:2354841-2354990     511145
NC_013654.1_sliding:437671-437820       431946