immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
513 stars 98 forks source link

question regarding harmony elbow plot and PCs selection #222

Closed iichelhadi closed 9 months ago

iichelhadi commented 9 months ago

Hi guys,

I ran harmony on my seurat obj and then elbow plot to select the number of PCs for downstream analysis. I found that a number of the earlier PCs explain less variance than the subsequent PCs Elbowplot_harmony.pdf I am not sure how this is possible. Does this indicate that there is something wrong with batch effect correction? Also, I was wondering if it is acceptable to reorder the PCs based on explained variance and to use those for downstream analyses?

Regards

pati-ni commented 9 months ago

Hi @iichelhadi,

Are these harmony embeddings you are using for this plot. Please share the code that you ran.

iichelhadi commented 9 months ago

Hi @iichelhadi,

Are these harmony embeddings you are using for this plot. Please share the code that you ran.

Yes they are. This is my harmony integration code. I am certain that there are no issues with the code. Also, something similar was pointed out before issue#175

merged_seurat <-
    RunHarmony(
      merged_seurat,
      group.by.vars = c("orig.ident","Platform"),
      reduction.use = "pca",
      max_iter = 50,
      dims.use = 1:50,
      lambda = NULL
    )
pati-ni commented 9 months ago

What is elbowplot computing…? Usually the std dev comes precomputed as the eigenvalues of the PCA's decomposition.

Why do you want your embeddings sorted by variance?

Why would you expect that batch correction would keep the variance in tact?

iichelhadi commented 9 months ago

yes the elbowplot here shows the stdev std <- merged_seurat[["harmony"]]@stdev For downstream analysis I have to choose a number PCs for umap dim red and finding neighbors for clustering. Typically we have to select the first PCs which explain the most variance for example PC1 to PC20. After harmony this is no longer the case. My question is: Can I reorder the pcs based on explained variance then select the first reordered 20 PCs which could exclude PC15 but include PC28 for example.

pati-ni commented 9 months ago

Yes, you can do that. That said, bare in mind that the residuals of the dimensions you are removing still have some resemblance to the count data.