immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
529 stars 99 forks source link

Best Practices for Sequential vs. Parallel Regression of Multiple Covariates in Harmony #263

Open vertesy opened 3 weeks ago

vertesy commented 3 weeks ago

Hello,

I'm using Seurat with Harmony for batch correction in my scRNA-seq analysis, and I have a question regarding the regression of multiple covariates.

Background:

I want to regress out three covariates from my data:

Initially, I attempted to regress out all three covariates in parallel by concatenating the corresponding metadata columns, split the merged object, and providing that to Harmony. It fails at splitting, because of too small / empty categories.

Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.
Error in validObject(object = object) : 
  invalid class “Assay5” object: Layers must be two-dimensional objects

I understand that small categories will also be a problem for correction, even if I fix the failing data split.

Not sure how I can solve this:

  1. Ignore some covariates
  2. Subset to SampleType 1, and keep covariates (Library, CellCyclePhase). Repeat for s.t.2. Suboptimal.
  3. Regress out cell cycle scores in ScaleData(), and provide covariates SampleType and Library to Harmony. (or variations thereof)
    1. One issue is that regression in ScaleData() works much less well then Harmony to remove differences.
    2. (related to #262)
  4. Iterative / Sequential / Serial Harmony corrections.

I recall that the Harmony authors discussed a "serial Harmony" approach, where covariates are corrected sequentially rather than in parallel, but I haven't been able to re-find that discussion again.

My Questions:

  1. Is there a recommended practice for handling situations where (concatenating covariates leads to / there are) too many, and sparse categories?

    (other than don't do it)

  2. Can I legitimately overcome the small categories problem by sequential Harmony, and should result in equivalent results to parallel regression in Harmony (assuming both are possible)?

    • Could sequential regression help mitigate issues arising from sparse category combinations?
  3. How can I implement sequential regression of covariates in Harmony within Seurat?

    • Feed "harmony" reduction into RunHarmony() instead of "pca" at the 2nd and 3rd variable?
    • Are there recommended workflows or code examples for applying Harmony multiple times, each time correcting for a single covariate?
    • Do I need to adjust Harmony parameters, e.g Library has 25 categories, Phase has 3.

Additional Context:

Thank you for your time taken.