Query about data subsetting

RachBioHaz commented 1 year ago

Hi, I just have a quick query about the best way to utilise benchdamic for my specific dataset. I have samples from 4 different gut regions that have resulted in region-specific sampling depths due to the different microbial loads of each region. I am looking to analyse the data in 2 approaches - as whole gut and as regional subsets. To assess the validity of the different DA methods, should I be running this on the whole or the subsets?

Thanks for your help

mcalgaro93 commented 1 year ago

Hi @RachBioHaz, thank you for your question.

To effectively evaluate Type I Error Control, it is suggested to use a homogenous group of samples, where you don't anticipate finding any differentially abundant features. So, if you have two experimental conditions (A and B) to compare, across all the regions (region1, region2, region3, region4), these homogenous groups could be formed from samples in region1-conditionA or region1-conditionB or region2-conditionA, and so on. Alternatively, you can leverage insights from alpha and beta diversities to identify homogenous groups of samples. To make the analysis more manageable in terms of both computational burden and the number of groups, you may want to consider testing only some of the latest state-of-the-art DAA methods such as ZicoSeq, ANCOMBC, linDA, Maaslin2, on a subset of regions to assess their performance in controlling the number of false discoveries.

For the concordance analysis, a region-wise approach could be a suitable option. By comparing conditionA and conditionB samples in region1, then region2, and so on, you can observe how the methods behave consistently across different regions. This region-wise approach could give you valuable insights into their performance when applied to the whole gut.

I don't know if this is the case, but I would advise against a whole gut analysis if the region is a strong driver of variability in your data.

I hope this helps! Let me know if you have any further questions or need additional information. Matteo

RachBioHaz commented 1 year ago

Thanks so much. Your explanation has made things a lot clearer.

mcalgaro93 / benchdamic

Query about data subsetting #5