Closed yesitsjess closed 4 months ago
Hi @yesitsjess, this is a really interesting philosophical question. Typically, PCs are ordered by variance and so we "take the top N PCs" for feature selection. However, you don't need to necessarily select consecutive PCs. Some users of Harmony with scATACseq carefully select which non-consecutive PCs to use for Harmony and downstream analysis, carefully investigating which PCs encode biology vs noise. For your purposes, I would suggest using whichever method you like to select the relevant PCs after Harmony and then use those potentially non-consecutive PCs in all your downstream tasks.
pcs_idx = my_selection_method(...)
obj <- RunTSNE(obj, reduction.name="integrated_tsne", reduction="harmony", assay="SCT", dims=pcs_idx)
Hi, this is not an issue but more a question relating to single cell data analysis in Seurat. I think it is just a more general question about integration so I've posted here. Please let me know if I should be asking somewhere else.
After integration with harmony some of my earlier PCs explain less variance than later PCs.
I get this by running:
To select the number of PCs to use for UMAP/TSNE/clustering I take the larger value of: 1 .The point where the principal components only contribute 5% of standard deviation and the principal components cumulatively contribute 90% of the standard deviation.
I then use this
pcs
(in this case 27) for the following:But I'm wondering if I should be reordering my PCs based on explained variance after harmony integration? And then selecting the specific PCs to use (rather than using all of 1 to 27)?