immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
513 stars 98 forks source link

Is it okay to paste two covariates into one when correcting for two covariates? #246

Closed YiweiNiu closed 5 months ago

YiweiNiu commented 6 months ago

Hi,

Sometimes, people choose to paste two covariates into one like this: Seurat V5 FindVariableFeatures() and HarmonyIntegration() Question.

Considering the theta value (maybe also lambda) difference for one covariate or two covariates (#100, #24), do you think it is okay to use this way? I am not sure how this would affect the correction results.

pati-ni commented 5 months ago

Hi @YiweiNiu,

It depends on your study design. If the design is hierarchical, pasting the covariates in a single one does not make a difference. For example: obj$tech_sample <- paste0(obj$tech, "_", obj$sample), unless you did the same biological samples with different technology then the number levels for the covariate will be identical with obj$sample.

If you could give me more details about your situation, I can be more helpful.

YiweiNiu commented 5 months ago

Hi @pati-ni,

Thank you so much for your reply.

Our experimental design goes like this:

image

We have several libraries with pooled donors and several donors with replicated libraries, and I want to correct the variances from different libraries and donors. It's not fully hierarchical.

Do you think it's okay to paste donor ID and library ID in this case?

pati-ni commented 5 months ago

Hi @YiweiNiu

In your case, I would recommend having two independent covariates. Otherwise, you compromise both sample library batch effects (Harmony does not know which cells are from the same sample) and donor-specific effects (Harmony does not know which cells are from the same donor).

If you can, let me know how well harmony works in this use case

YiweiNiu commented 5 months ago

Hi, thank you again for your quick reply! I tried to use both Donor ID and Library ID as covariates, and for now it works quite well. Thanks for this great tool!