motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
144 stars 24 forks source link

Strange inconsistant row in mOTUs output #112

Closed xapple closed 1 year ago

xapple commented 1 year ago

I ran the following command, with some artificially simulated HiSeq paired-end reads:

$ ~/mambaforge/envs/tax/bin/motus profile -c -p -u -n count -t 10 -o ~/runs/iss/motus.tsv -f ~/runs/iss/reads_fwd.fastq -r ~/runs/iss/reads_rev.fastq

And when I look at the output TSV file, the last line seems like a bug. Here is the tail of the file motus.tsv:

meta_mOTU_v25_14519     Proteobacteria species incertae sedis   NA      0
meta_mOTU_v25_14520     Gammaproteobacteria species incertae sedis      NA      0
meta_mOTU_v25_14521     Bacteria species incertae sedis NA      0
meta_mOTU_v25_14522     Clostridiales species incertae sedis    NA      0
meta_mOTU_v25_14523     Bacteria species incertae sedis NA      0
-1      -1      NA      56

Should we just deleted the last line in this case ? Is it safe to ignore ?

xapple commented 1 year ago

Running Version: 2.5.1

hjruscheweyh commented 1 year ago

Good Morning @xapple

We would encourage you to switch to a more recent version as 2.5 is based on data from 2019. Version 3 includes data from MAGs from 23 environments --> https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-022-01410-z

The -1 (in version 3 it is called unassigned) represents the abundance of unknown species. This fraction is important to normalise e.g the relative the abundance within a sample but also to compare multiple samples with another (see Figure 2 in https://currentprotocols.onlinelibrary.wiley.com/doi/full/10.1002/cpz1.218 and Figure 1 in the https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-022-01410-z). So no, don't remove it.

Best, Hans

xapple commented 1 year ago

Thanks for your quick answer, and the explanation for the last time.

I will update the version to 3.1.0. For some reason when I typed conda install motus that's the version I got. Probably because of a convoluted dependency exclusion graph in the current environement.

May I suggest to title the unassigned reads with unassigned or unknown instead of -1 in the output files.