perishky / meffil

Efficient algorithms for analyzing DNA methylation data.
Artistic License 2.0
55 stars 28 forks source link

usage of cell type output #59

Open rkb965 opened 7 months ago

rkb965 commented 7 months ago

Hello! Thank you for taking the time to write this great package. I have a few questions that are related to best practices with {meffil} output, but I admit they are not strictly about the package (and I certainly do not expect a response!). Please feel free to point me to a more appropriate spot for these.

  1. Is it standard practice to include all predicted cell types as adjustment variables? This code from README suggests yes:

    Add cell count estimates to the set of covariates. counts <- t(meffil.cell.count.estimates(norm.objects)) covariates <- cbind(covariates, counts) Run the EWAS. ewas.ret <- meffil.ewas(norm.beta, variable=variable, covariates=covariates)

The cell count estimates are highly correlated, and I think they may be resulting in inflated/ unstable effect estimates. Is this common? I am entirely new to this, but I don't see it discussed. For full disclosure in case this is somehow deal with in {meffil}, I am using the cell count estimates from {meffil} but the modeling is in a different pipeline.

  1. If you include only a subset of predicted cell types, are there guidelines for amount of information that is acceptable to drop? Most of my samples have near-zero (for some definition of "near") amounts of all but two cell types, but I absolutely have some samples with non-trivial contributions from each of the other five cell types.

  2. Are cell type predictions particularly sensitive to reference datasets? We have a pediatric population with saliva samples and are inclined to use {mefill}'s detailed (7 cell types) output using the gse35069/gse48472 reference datasets, but there is also a reference saliva dataset from a pediatric population BeadSorted.Saliva.EPIC which is from an explicitly pediatric population but only has leukocytes and epithelial cells. With this, I'm not sure whether differential methylation by cell type makes {mefill}'s use of saliva gse48472 more appropriate or if age-specific effects make BeadSorted.Saliva.EPIC more appropriate for this use case.

Many thanks for any wisdom you can share!