Closed ilnamkang closed 2 years ago
Hi, thanks for using KMCP. I've added a flag --metaphlan-report-version
to choose the MetaPhlAn format version, and the default is 3
, so you don't need change the commands.
The format should be OK now:
#SampleID
#clade_name NCBI_tax_id relative_abundance additional_species
k__Bacteria 2 100.000000
k__Bacteria|p__Proteobacteria 1224 99.530189
k__Bacteria|p__Verrucomicrobia 74201 0.469811
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria 1236 99.530189
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae 203494 0.469811
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales 91347 99.530189
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales 48461 0.469811
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae 543 99.530189
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Akkermansiaceae 1647988 0.469811
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia 561 99.530189
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Akkermansiaceae|g__Akkermansia 239934 0.469811
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia coli 562 99.530189
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Akkermansiaceae|g__Akkermansia|s__Akkermansia muciniphila 239935 0.469811
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia coli|t__Escherichia coli SE15 431946 48.321535
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia coli|t__Escherichia coli K-12 83333 46.194629
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia coli|t__Escherichia coli O157:H7 str. Sakai 386585 5.014025
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Akkermansiaceae|g__Akkermansia|s__Akkermansia muciniphila|t__Akkermansia muciniphila ATCC BAA-835 349741 0.469811
Thank you for a quick update!!
I've rerun "kmcp profile" using the updated binary. Now, the result file contains "NCBI_tax_id".
But, this updated output file was still not recognized by Pavian. Therefore, I looked the Pavian repository to see if there are any hints for the solution.
In my opinion, Pavian seems to require specific strings in the first line of input files of MetaPhlAn format.
I found these lines in one of R script files of the Pavian repository. (https://github.com/fbreitwieser/pavian/blob/master/R/datainput-read_report.R) ---- is_metaphlan3_fmt <- grepl("^#mpa_v3", first.line) is_metaphlan_fmt <- grepl("Metaphlan2_Analysis$", first.line) ----
So, I guessed that input files of MetaPhlAn3 format should start with "#mpa_v3" and those of MetaPhlAn2 format should have the first line ending with with "Metaphlan2_Analysis".
Therefore, I tried inserting "#mpa_v3" in the first line of the kmcp output file as below. This edited file is recognized by Pavian without any errors.
----
kBacteria 2 100.000000
kBacteria|pProteobacteria 1224 57.368930
kBacteria|pActinobacteria 201174 32.664842
kBacteria|p__Bacteroidetes 976 4.010597
----
Thanks for your help.
I see, the MetaPhlAn3 outputs the database info as the first line in the profile:
#mpa_v30_CHOCOPhlAn_201901
Thank you for a great tool!
I'd like to use Metaphlan format profile report file obtained from "kmcp profile" as an input of Pavian, which draws Sankey plots. ( https://github.com/fbreitwieser/pavian ) ( https://fbreitwieser.shinyapps.io/pavian/ )
But, when I tried, the Metaphlan format output file of "kmcp profile" is not recognized by Pavian shiny app, which claims that it can recognize Kraken, Centrifuge and MetaPhlAn report files.
When I compared my output file to the original MetaPhlAn3 output format (https://github.com/biobakery/biobakery/wiki/metaphlan3#output-files), it seems that kmcp output file does not have "NCBI_tax_id" column, which seems to be required by Pavian.
Are there any ways to solve this problem?
Thanks in advance.