transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

Assigning NAs as 1 might bias alpha diversity estimates? #50

Closed paulsalachan closed 4 years ago

paulsalachan commented 4 years ago

Hi Sam,

I am looking at your script for estimating alpha diversity indices (e.g. 'diversity_stats.R') and at the merge step where 'complete_table' is generated. Following this the NAs are assigned as 1. Given how sensitive alpha diversities are to singletons, I believe this could bias the diversity estimates generated by the script. Is there a reason why the NAs were not assigned as 0? Or am I wrong in my assumptions?

Thanks, Paul

transcript commented 4 years ago

Hi Paul,

You're right in your observation that NAs are assigned as 1 - this is a carryover from the DESeq script, which requires nonzero values to calculate differential expression.

I did a quick test, and it does look like setting the NAs as 1 influences the diversity metrics. I'll push an update that changes this to remain as 0s in the diversity measurement script.

Sam

transcript commented 4 years ago

See commit 11d43bb - https://github.com/transcript/samsa2/commit/11d43bbbb61ff202602fa100b6a83598342b928b