joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
569 stars 187 forks source link

rarefy_even depth question #1642

Open LaFra7 opened 1 year ago

LaFra7 commented 1 year ago

Hi all,

I have a question, that probably is a bit silly, so sorry for this.

When I apply the rarefy_even_depth function, many samples are removed from my dataset. I'm working with relative abundances or presence/absence data of plants found in the faeces of my samples.

If I remove these samples, the sampling design does not have any sense, because I will have different areas with different number of samples and they are not comparable. Is it an error to put 0 to all that plants in the samples that were removed with rarefaction? I mean, they were removed because they haven't enough number of reads, does this menas that the plant is not present or not? If it is not correct, do you have any suggestion on how to deal with this problem?

Thank you!

gmteunisse commented 1 year ago

Is the subsampling depth that you are using higher than the sequencing depth of the samples that are removed? Then this is normal behaviour, see https://github.com/joey711/phyloseq/issues/910.

One solution is to lower your subsampling depth to the minimum sequencing depth in your experiment when making use of rarefaction, which is the default for this function. Alternatively, you can use a different library normalisation technique that is appropriate to the type of analysis that you're doing; read https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531 and https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-017-0237-y as starting points. However, even if you are not using rarefaction, you need to carefully consider whether you want to include samples with a low sequencing depth.