Comparing metrics from different embeddings

Hello,

I have a question regarding the interpretation of scib metrics using different embeddings. I have an integrated object in which I would expect the X_scVI embedding to have the best scib metrics, and the normalized count matrix for 1000 HVGs to output inferior scores. I also computed X_umap (derived from X_scVI), and X_pca (derived from the normalized count matrix) scib scores.

I'm getting rather inconsistent results here, which makes me skeptical in how I compare scib scores from a given object. For reference, here my goal is to use scib metrics to identify the number of highly variable genes that best removes batch effect while preserving biological signal. I'm summarizing results as average bio and average batch scores. Bio scores are the average NMI, ARI, and ASW (by cell type label), and batch the average of ASW (by batch) and graph connectivity.

Somehow the count matrix (I computed this by adding the normalized count matrix to adata.obsm['X_raw']) has a better batch score (0.86) than the scVI embedding (0.85). UMAP and PCA have scores of 0.79 and 0.82, respectively. Additionally, UMAP has the best bio score (0.577), followed by scVI (0.566), PCA (0.55), and the normalized count matrix (0.546).

My question is if these types of results can be expected? I want to try a couple HVG sets and compare metrics after HVG selection to pick the right number, but based on these comparisons I'm not sure how to interpret the scores I'm getting. Perhaps my intuition is off that scVI should have the best scores and the normalized matrix the worst.

Thank you!

theislab / scib

Comparing metrics from different embeddings #384