Hello! Thank you for taking the time to write this great package. I have a few questions that are related to best practices with {meffil} output, but I admit they are not strictly about the package (and I certainly do not expect a response!). Please feel free to point me to a more appropriate spot for these.
Is it standard practice to include all predicted cell types as adjustment variables? This code from README suggests yes:
Add cell count estimates to the set of covariates.
counts <- t(meffil.cell.count.estimates(norm.objects))covariates <- cbind(covariates, counts)
Run the EWAS.
ewas.ret <- meffil.ewas(norm.beta, variable=variable, covariates=covariates)
The cell count estimates are highly correlated, and I think they may be resulting in inflated/ unstable effect estimates. Is this common? I am entirely new to this, but I don't see it discussed. For full disclosure in case this is somehow deal with in {meffil}, I am using the cell count estimates from {meffil} but the modeling is in a different pipeline.
If you include only a subset of predicted cell types, are there guidelines for amount of information that is acceptable to drop? Most of my samples have near-zero (for some definition of "near") amounts of all but two cell types, but I absolutely have some samples with non-trivial contributions from each of the other five cell types.
Are cell type predictions particularly sensitive to reference datasets? We have a pediatric population with saliva samples and are inclined to use {mefill}'s detailed (7 cell types) output using the gse35069/gse48472 reference datasets, but there is also a reference saliva dataset from a pediatric population BeadSorted.Saliva.EPIC which is from an explicitly pediatric population but only has leukocytes and epithelial cells. With this, I'm not sure whether differential methylation by cell type makes {mefill}'s use of saliva gse48472 more appropriate or if age-specific effects make BeadSorted.Saliva.EPIC more appropriate for this use case.
Hello! Thank you for taking the time to write this great package. I have a few questions that are related to best practices with {meffil} output, but I admit they are not strictly about the package (and I certainly do not expect a response!). Please feel free to point me to a more appropriate spot for these.
The cell count estimates are highly correlated, and I think they may be resulting in inflated/ unstable effect estimates. Is this common? I am entirely new to this, but I don't see it discussed. For full disclosure in case this is somehow deal with in {meffil}, I am using the cell count estimates from {meffil} but the modeling is in a different pipeline.
If you include only a subset of predicted cell types, are there guidelines for amount of information that is acceptable to drop? Most of my samples have near-zero (for some definition of "near") amounts of all but two cell types, but I absolutely have some samples with non-trivial contributions from each of the other five cell types.
Are cell type predictions particularly sensitive to reference datasets? We have a pediatric population with saliva samples and are inclined to use {mefill}'s detailed (7 cell types) output using the gse35069/gse48472 reference datasets, but there is also a reference saliva dataset from a pediatric population BeadSorted.Saliva.EPIC which is from an explicitly pediatric population but only has leukocytes and epithelial cells. With this, I'm not sure whether differential methylation by cell type makes {mefill}'s use of saliva gse48472 more appropriate or if age-specific effects make BeadSorted.Saliva.EPIC more appropriate for this use case.
Many thanks for any wisdom you can share!