Unimodal UMAP Projection of CosMx data onto snRNAseq data

Hi, I am trying to use my matched snRNAseq and CosMx data to predict cell types for the CosMx data from the snRNAseq. I generally have the code together (See #8660 for a related issue), but I want to make sure I am taking the correct approach. It briefly notes that this was done on the tutorial CosMx data, but the steps were not laid out. Edit: I updated the scripts to reflect further understanding of Seurat v5 integration. It gets worse....

The query data is from 5 slides, so it is stored in the seurat object as 5 layers:

An object of class Seurat
960 features across 140286 samples within 1 assay
Active assay: Nanostring (960 features, 0 variable features)
 5 layers present: counts.1, counts.2, counts.3, counts.4, counts.5
 2 dimensional reductions calculated: pca, umap
 5 spatial fields of view present: Run5873.S2..23 Run5873.S3..24 Run5873.S4..25 Run5990.S1..23.1B Run5990.S2..25.3B

So, that means I should probably integrate them and join layers, correct?

# loading reference and query sets ---------------------
# prep CosMx dataset 
cosmx_int = file.path(out_path, paste(prefix, "integrated.rds", sep = "."))
print(sprintf("Checking for existing integrated dataset at %s", cosmx_int))

# write integrated if it doesn't exist (time saver)
if (file.exists(cosmx_int)) {
  print("Found existing intergrated file...")
  sc <- readRDS(cosmx_int)
} else {
  print(sprintf("Opening and prepping %s...", so))
  sc <- readRDS(so)
  sc %>%
    NormalizeData(scale.factor = 6000) %>% # Giotto uses 6000 for CosMx and Log2fc, 
    FindVariableFeatures() %>%
    ScaleData() %>%
    RunPCA(npcs = pcs) -> sc
  print(sprintf("Integrating %s...", prefix))
  sc %>%
    IntegrateLayers(
      method = HarmonyIntegration, 
      new.reduction = "harmony", 
      verbose = TRUE
    ) %>%
    RunUMAP(
      reduction = "harmony", 
      dims = 1:pcs, 
      reduction.name = "umap.harmony"
    ) %>%
    JoinLayers() -> sc
  print(sprintf("Saving %s to %s...", prefix, cosmx_int))
  saveRDS(sc, file = cosmx_int)
} 

# setup the reference for CosMx features
print(sprintf("Opening and prepping %s...", ro))
annot <- readRDS(ro)
DefaultAssay(annot) <- "RNA"
annot %>%
  NormalizeData() %>%
  FindVariableFeatures() %>%
  ScaleData(
    features = union(rownames(sc), VariableFeatures(annot)), 
    vars.to.regress = c("batch", "orig.ident")
  ) %>%
  RunPCA(
    features = union(rownames(sc), VariableFeatures(annot)), 
    npcs = pcs
  ) %>%
  RunUMAP(dims = 1:pcs, return.model = TRUE) -> annot

# Transfer annotations from reference ------------------
print("Finding anchors and transfering...")
cols <- unlist(strsplit(annot_col, ",")) # in this case the string is "subclass,cluster"
names(cols) <- cols
cols <- lapply(cols, function(i) assign(i, annot@meta.data[[i]]))
sc.anchors <- FindTransferAnchors(
  reference = annot, 
  reference.assay = "RNA",
  query = sc, 
  query.assay = "Nanostring",
  dims = 1:pcs, 
  reference.reduction = "pca"
)

# correct for too few anchors
kwt = 50
if (nrow(sc.anchors@anchors) < 50) {
  print("the k.weight is set to 5 owing to fewer than 50 anchors")
  kwt = 5
}
sc <- MapQuery(
  anchorset = sc.anchors, 
  reference = annot, 
  query = sc, 
  refdata = cols, 
  reference.reduction = "pca", 
  reduction.model = "umap",
  transferdata.args = list(k.weight = kwt)
)

Is SCTransform mandatory for this assignment, or could you compare the standard normalized/scaled CosMx data? Would you modify the settings for RNA method to handle CosMx similar to the clip.range = c(-10,10) argument in the CosMx tutorial?

Presumably, I would want to feed the 950 genes available for CosMx data in to the reference dataset as features for scaling and pca calculation. However, it would be preferable to also have these appear in the reference's scale.data layer, as well as include them for the Reference's RunPCA. I included them by using the union of my CosMx features and the ones found with FindVariableFeatures. If anything the additional variable features not in the CosMx data seem like they would be useful for generating the reference UMAP correctly, and likely wouldn't impact the transfer anchors at all?

PCs I have a 30, as that was informative for the snRNAseq, though perhaps scale back the PCs considering how few genes are represented in the CosMx set.

When I did run the FindTransferAnchors without integrating the layers, specifying only the 950 features in the CosMx set for the ref ScaleData and RunPCA, I get remarkably few anchors (25) for over 140k cells in CosMx, ~100k in snRNAseq. So something seems odd. If I integrate with v5, I think it should be generating a single scale.data layer, but the assay will still be Nanostring? In v4 it seemed like it generated a integrated Assay with it's own scale.data slot. This way it looks like it is basically not using the harmony or harmony.umap reductions, only the combined scale.data layer and potentially the pca reduction has been recalculated on the integrated layers (though i didn't call RunPCA again)? Is there any reason to bother with the integration, or will it by default transfer label predictions to each of the original 5 layers (assuming I Normalize %>% FindVariableFeatures %>% ScaleData %>% RunPCA each of them)?

satijalab / seurat

Unimodal UMAP Projection of CosMx data onto snRNAseq data #8814