vmikk / metagMisc

Miscellaneous functions for metagenomic analysis.
MIT License
44 stars 11 forks source link

Average rarified OTU/ASV table after multiple rarefaction #20

Closed shreyaskumbhare closed 2 years ago

shreyaskumbhare commented 2 years ago

Hi @vmikk. I am finding this package very useful. I am currently trying to perform multiple rarefaction using phyloseq_mult_raref function. What I need to achieve is an average rarified table of all the iterations. I have used the other function i.e. phyloseq_mult_raref_avg, however the output is a relative abundance matrix. I would really appreciate it if you could help me out in obtaining an averaged rarified table with absolute abundance, instead of relative abundance.

Thanks!

vmikk commented 2 years ago

Hello Shreyas!

To get the table with absolute abundances averaged across rarefactions, you may just multiply relative abundances by rarefaction depth. It should be equivalent to the averaging of absolute abundances, isn't it?

E.g.,

raref_depth <- 100

data(esophagus)
rr <- phyloseq_mult_raref_avg(esophagus, SampSize = raref_depth, iter = 10)
rr <- transform_sample_counts(rr, function(x){ x * raref_depth })

head(otu_table(rr))

Please note, that average abundances could be fractional.

With kind regards, Vladimir

shreyaskumbhare commented 2 years ago

Yeah, this makes sense!! Thanks @vmikk! I had another query, can you elaborate on what exactly does the "replace" parameter do and is it recommended to use?

vmikk commented 2 years ago

The replace parameter defines the type of random subsampling - with replacement (replace = TRUE) or without (replace = FALSE). As an example:

set.seed(1)
sort(sample(1:10, replace = TRUE))
sort(sample(1:10, replace = FALSE))

The first command returns 1 1 2 2 3 4 5 7 7 9, while the second - 1 2 3 4 5 6 7 8 9 10

In the first case, you may see that some numbers occur multiple times, while they were present only once in the original sample.

Random subsampling without replacement should preserve the shape of the abundance distribution and this is the default option in vegan::rarefy.

Sampling with replacement is the same as bootstrapping, and is the default option in phyloseq::rarefy_even_depth.

In principle, you may construct rarefaction curves without iterative subsampling - using analytical expression (based on the hypergeometric distribution), and it should be identical to the sampling without replacement.

HTH, Vladimir