wdl2459 / ConQuR

Batch effects removal for microbiome data via conditional quantile regression
GNU General Public License v3.0
26 stars 4 forks source link

New relative abudnance? #13

Open xyzhang-101 opened 1 year ago

xyzhang-101 commented 1 year ago

Hi there, thanks for sharing this tool! It worked really well with my data after checking the PCoA plot. I have two questions regarding the downstream analysis though:

  1. After using the tool and new read counts were generated, is it okay to calculate the new relative abundances using the new total read counts?
  2. I used age, BMI as the covariates in removing the batch effect, and my downstream analysis would be using mixed linear model. Would that be okay to have age, BMI added as covaraites in my linear model again, and the corrected species abundance as the outcome?

And what other downstream analysis would you suggest after this tool with read counts? I generally use the relative abundances so

Thanks!

tommyfuu commented 1 year ago

hi, I am working with Dr. Ling on the side and can answer these questions: (1) it's certainly okay to convert from total read counts (ConQuR) outputs to relative abundance for your analyses; but not the other way around as that will require sequencing depth information. (2) certainly. the point of having the covariates is that ConQuR tries to correct out batch effects while attempting to make sure the effect of covariates of interests (such as disease state, treatments, BMI) does not get eliminated. So the corrected output can certainly be used for your abovementioned analyses.

other microbiome analyses routinely done include (1) differential abundance testing; (2) leverage the data to build predictive models for covariates of interests; (3) alpha/beta diversity analyses. these can be done with both relab data and count data.

wdl2459 commented 1 year ago

Completely agree with Tom. Thanks a lot! One additional comment for Q2: I guess you (1) plan to analyze longitudinal/clustered microbiome data as you mentioned mixed model (2) worry about double-dipping issue as the covariates are used twice, in both the correction and subsequent analyses, theoretically leading to over-optimism in association analysis. For (1), we are developing a longitudinal version of ConQuR, which is more appropriate for the task, while it is OK to try the current ConQuR and check the performance. For (2), in practice, this double-dipping bias is modest relative to the batch effects, and the inclusion of metadata is often helpful for estimating conditional distributions when the taxon is uncommon or imbalanced among batches.