satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.18k stars 891 forks source link

join layers or not ( contradicting vignettes) standard workflow and sketch #9013

Open Flu09 opened 3 weeks ago

Flu09 commented 3 weeks ago

Hello, I am having a hard time deciding on joining layers or not.

-- > before normalizing if i have different datasets, technologies, or different donors. --> before finding neighbors and clusters step. --> scaling is done when object is split in all vignettes (I want to make sure if this is correct)

Flu09 commented 3 weeks ago
 # In this vignette: https://satijalab.org/seurat/articles/integration_introduction 
# split the RNA measurements into layers (this example was control and stimulated samples)
ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)
ifnb <- NormalizeData(ifnb)
ifnb <- FindVariableFeatures(ifnb)
ifnb <- ScaleData(ifnb)
ifnb <- RunPCA(ifnb)
ifnb <- IntegrateLayers(object = ifnb, method = CCAIntegration, orig.reduction = "pca", new.reduction = "integrated.cca",
    verbose = FALSE)

# re-join layers after integration
ifnb[["RNA"]] <- JoinLayers(ifnb[["RNA"]])
ifnb <- FindNeighbors(ifnb, reduction = "integrated.cca", dims = 1:30)
ifnb <- FindClusters(ifnb, resolution = 1)

# while in this sketch vignette: https://satijalab.org/seurat/articles/parsebio_sketch_integration
# the object of samples from 24 individuals was normalized without splitting.. why? if I had different technologies or dataset should have splitted it ?
object <- NormalizeData(object)
# split assay into 24 layers
object[["RNA"]] <- split(object[["RNA"]], f = object$sample)
object <- FindVariableFeatures(object, verbose = FALSE)

object <- SketchData(object = object, ncells = 5000, method = "LeverageScore", sketched.assay = "sketch")
object
DefaultAssay(object) <- "sketch"
object <- FindVariableFeatures(object, verbose = F)
object <- ScaleData(object, verbose = F)
object <- RunPCA(object, verbose = F)
object <- IntegrateLayers(object, method = RPCAIntegration, orig = "pca", new.reduction = "integrated.rpca",
    dims = 1:30, k.anchor = 20, reference = which(Layers(object, search = "data") %in% c("data.H_3060")),
    verbose = F)

# why here there is not a rejoin layers step like the other vignette before clustering?
object <- FindNeighbors(object, reduction = "integrated.rpca", dims = 1:30)
object <- FindClusters(object, resolution = 2)