motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
147 stars 27 forks source link

Consistency for `taxonomy` and `full-taxonomy` output #103

Open valentynbez opened 2 years ago

valentynbez commented 2 years ago

Problem:

Proposed solution: both commands produce table with #mOTU and consensus_taxonomy columns.

AlessioMilanese commented 2 years ago

Example result using -p:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 -p | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test4 -p
#consensus_taxonomy NCBI_tax_id test4
Kandleria vitulina [ref_mOTU_v2_0001]   1630    0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002]    378211  0.0000000000
Megasphaera genomosp. [ref_mOTU_v2_0003]    699192  0.0234955832

If the result with two columns is not easy to import in qiime, then we do not implement the -p in qiime.

valentynbez commented 2 years ago

Example of output with -q

#mOTU   consensus_taxonomy  sampleA sampleB
ref_mOTU_v3_00095   k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia coli   1   0
ref_mOTU_v3_00096   k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Citrobacter|s__ Citrobacter sp. [Citrobacter amalonaticus/Citrobacter braakii/Citrobacter freundii/Citrobacter pasteurii/Citrobacter portucalensis/Citrobacter sp. A1/Citrobacter sp. A316/Citrobacter sp. AATXQ/Citrobacter sp. AATXR/Citrobacter sp. BIDMC107/Citrobacter sp. BIDMC108/Citrobacter sp. CFSAN044567/Citrobacter sp. FDAARGOS_156/Citrobacter sp. KTE151/Citrobacter sp. KTE30/Citrobacter sp. KTE32/Citrobacter sp. L17/Citrobacter sp. MGH100/Citrobacter sp. MGH103/Citrobacter sp. MGH104/Citrobacter sp. MGH105/Citrobacter sp. MGH109/Citrobacter sp. MGH110/Citrobacter sp. MGH99/Citrobacter werkmanii/Citrobacter youngae/Enterobacter cloacae/Enterobacter sp. GN02600/Escherichia coli/Salmonella enterica]    1   0
ref_mOTU_v3_00855   k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae|g__Bacteroides|s__Bacteroides uniformis  0   1
ref_mOTU_v3_02367   k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae|g__Bacteroides|s__Bacteroides dorei/vulgatus [Bacteroidia bacterium UC5.1-2G11/Candidatus Gastranaerophilales bacterium HUM_8/Bacteroides dorei/Bacteroides sp. 3_1_33FAA/Bacteroides sp. 4_3_47FAA/Bacteroides sp. 9_1_42FAA/Bacteroides sp. 3_1_40A/Bacteroides vulgatus]  0   1

I think it's better to have a consistent number of columns, meaning separating the output in the column #consensus_taxonomy in your case.