Closed semmrich closed 3 years ago
Update:
When I leave out the PCA/Neighbors/Clustering/UMAP step and use the query without dimreds,
S.mmuBM.integrated
An object of class Seurat
58801 features across 19480 samples within 3 assays
Active assay: integrated (3421 features, 3421 variable features)
2 other assays present: RNA, ADT
it still fails to run through:
Transfer.anchors <- FindTransferAnchors(reference = S.mmuREF.integrated, query = S.mmuBM.integrated, dims = 1:50)
Performing PCA on the provided reference using 4656 features as input.
Projecting PCA
Error in Loadings(object = reference[[reduction]])[features, dims] :
subscript out of bounds
toc()
131.93 sec elapsed
Transfer.anchors <- FindTransferAnchors(reference = S.mmuREF.integrated, query = S.mmuBM.integrated, dims = 1:50,
+ features = intersect(S.mmuREF.integrated@assays[["integrated"]]@var.features,S.mmuBM.integrated@assays[["integrated"]]@var.features))
Performing PCA on the provided reference using 1764 features as input.
Projecting PCA
Error in Loadings(object = reference[[reduction]])[features, dims] :
subscript out of bounds
toc()
58.95 sec elapsed
Interestingly, the Vignette example "panc8" runs smoothly while showing the same setup of the two Seurat objects:
pancreas.query
An object of class Seurat
34363 features across 638 samples within 1 assay
Active assay: RNA (34363 features, 2000 variable features)
pancreas.integrated
An object of class Seurat
36363 features across 5683 samples within 2 assays
Active assay: integrated (2000 features, 2000 variable features)
1 other assay present: RNA
2 dimensional reductions calculated: pca, umap
The query has no dimreds, while the ref has. I don't get it... Whats more:
>S.mmuREF.integrated <- ScaleData(object = S.mmuREF.integrated, assay = "integrated", , features = intersect(rownames(S.mmuREF.integrated),rownames(S.mmuBM.integrated)))
Centering and scaling data matrix
|=================================================================================================================================================| 100%
>S.mmuREF.integrated <- RunPCA(object = S.mmuREF.integrated, assay = "integrated", npcs = 50, verbose = TRUE, features = intersect(rownames(S.mmuREF.integrated),rownames(S.mmuBM.integrated)))
>S.mmuREF.integrated <- FindNeighbors(object = S.mmuREF.integrated, reduction = "pca", dims = 1:50)
Computing nearest neighbor graph
Computing SNN
>S.mmuREF.integrated <- FindClusters(object = S.mmuREF.integrated, resolution = 0.5)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 64650
Number of edges: 2539416
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9242
Number of communities: 23
Elapsed time: 22 seconds
> S.mmuREF.integrated <- RunUMAP(object = S.mmuREF.integrated, dims = 1:50)
16:47:21 UMAP embedding parameters a = 0.9922 b = 1.112
16:47:21 Read 64650 rows and found 50 numeric columns
16:47:21 Using Annoy for neighbor search, n_neighbors = 30
16:47:21 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:47:37 Writing NN index file to temp file C:\Users\FUCHSD~1\AppData\Local\Temp\RtmpY9vD9W\file2af42e0b14bb
16:47:37 Searching Annoy index using 1 thread, search_k = 3000
16:47:58 Annoy recall = 100%
16:47:59 Commencing smooth kNN distance calibration using 1 thread
16:48:05 Initializing from normalized Laplacian + noise
16:48:13 Commencing optimization for 200 epochs, with 2920214 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:49:32 Optimization finished
> S.mmuREF.integrated <- ScaleData(object = S.mmuREF.integrated, assay = "integrated", features = intersect(rownames(S.mmuREF.integrated),rownames(S.mmuBM.integrated)))
Centering and scaling data matrix
|=================================================================================================================================================| 100%
>dim(S.mmuREF.integrated@reductions[["pca"]]@feature.loadings)
[1] 1764 50
> Transfer.anchors <- FindTransferAnchors(reference = S.mmuREF.integrated, query = S.mmuBM.integrated, dims = 1:50, features = intersect(S.mmuREF.integrated@assays[["integrated"]]@var.features,S.mmuBM.integrated@assays[["integrated"]]@var.features))
Performing PCA on the provided reference using 1764 features as input.
Projecting PCA
Error in Loadings(object = reference[[reduction]])[features, dims] :
subscript out of bounds
This is very puzzling... All feature dims can be found in each ref and query, as in the vignette, still fails to retrieve anchors??? Any ideas that I could try are highly appreciated... Thx, Stephan
UPDATE:
To work with datasets for reference and query that share the exact same features or rownames in the Seurat object I intersected them (although this is actually not ideal and maybe not intended by the developers as well, since FindIntegrationAnchors
can handle different feature loadings in the Seurat objects to integrate):
>S.mmuBM.integrated
An object of class Seurat
60380 features across 19480 samples within 3 assays
Active assay: integrated (5000 features, 5000 variable features)
2 other assays present: RNA, ADT
> S.mmuREF.integrated
An object of class Seurat
24540 features across 64650 samples within 2 assays
Active assay: integrated (4656 features, 4656 variable features)
1 other assay present: RNA
> common.features <- unique(intersect(rownames(S.mmuREF.integrated),rownames(S.mmuBM.integrated)))
> length(common.features)
[1] 2533
> S.mmuREF.integrated.cf <- S.mmuREF.integrated[common.features,]
> S.mmuBM.integrated.cf <- S.mmuBM.integrated[common.features,]
> Transfer.anchors <- FindTransferAnchors(reference = S.mmuREF.integrated.cf, query = S.mmuBM.integrated.cf, dims = 1:50)
Performing PCA on the provided reference using 2533 features as input.
Projecting PCA
Error in Loadings(object = reference[[reduction]])[features, dims] :
subscript out of bounds
At this point I am out of ideas, and have to abandon the use of the TransferAnchors
Seurat feature. Although I am still a big Fan of the Seurat package, it feels a little disappointing that this particular function does not work on a "real life" example, and while the error msg seems to be informative, the problem cannot be solved, even if the datasets feature loadings became smaller and converging up to a point of impairing a biological interpretation of the results.
Maybe so finds a solution for this...
St
Perhaps try changing the dims
parameter - do you have less than 50 PCs?
Also feel free to send a downsampled version of your object to seuratpackage@gmail.com so we can more quickly diagnose the issue.
Hi @jaisonj708, Thank you for the input! Actually all of my objects generated with Seurat have the 50 PCs, but I will take a closer look at it indeed.
Here is sth interesting: The issue was vexing me such that I set out to create a reprex. I did, and in order to run it one would need to download and unpack three additional datasets from 10XGenomics support in addition to the ones provided with SeuratData:
library(Seurat)
library(SeuratData)
InstallData("cbmc")
InstallData("pbmc3k")
data("cbmc")
data("pbmc3k")
# https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.1.0/manual_5k_pbmc_NGSC3_ch1
pbmc5k.data <- Read10X(data.dir = ".../pbmc5k/filtered_feature_bc_matrix/")
pbmc5k <- CreateSeuratObject(counts = pbmc5k.data, project = "pbmc5k")
# https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/frozen_bmmc_healthy_donor1
bmmc1.data <- Read10X(data.dir = ".../bmmc1/filtered_matrices_mex/hg19/")
bmmc1 <- CreateSeuratObject(counts = bmmc1.data, project = "bmmc1")
# https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/frozen_bmmc_healthy_donor2
bmmc2.data <- Read10X(data.dir = ".../bmmc2/filtered_matrices_mex/hg19/")
bmmc2 <- CreateSeuratObject(counts = bmmc2.data, project = "bmmc2")
blood.list <- list(CB=cbmc,PB3k=pbmc3k,PB5k=pbmc5k,BM1=bmmc1,BM2=bmmc2)
for (i in 1:length(blood.list)) {
blood.list[[i]] <- NormalizeData(blood.list[[i]], verbose = FALSE)
blood.list[[i]] <- FindVariableFeatures(blood.list[[i]], selection.method = "vst",
nfeatures = 2000, verbose = FALSE)
}
reference.list <- blood.list[c("CB","PB3k","BM1")]
blood.anchors <- FindIntegrationAnchors(object.list = reference.list, dims = 1:30)
blood.integrated <- IntegrateData(anchorset = blood.anchors, dims = 1:30)
library(ggplot2)
library(cowplot)
library(patchwork)
# switch to integrated assay. The variable features of this assay are automatically
# set during IntegrateData
DefaultAssay(blood.integrated) <- "integrated"
# Run the standard workflow for visualization and clustering
blood.integrated <- ScaleData(blood.integrated, verbose = FALSE)
blood.integrated <- RunPCA(blood.integrated, npcs = 30, verbose = FALSE)
blood.integrated <- RunUMAP(blood.integrated, reduction = "pca", dims = 1:30)
p1 <- DimPlot(blood.integrated, reduction = "umap", group.by = "orig.ident")
p2 <- DimPlot(blood.integrated, reduction = "umap", group.by = "rna_annotations", label = TRUE, repel = TRUE) + NoLegend()
p3 <- DimPlot(blood.integrated, reduction = "umap", group.by = "protein_annotations", label = TRUE, repel = TRUE) + NoLegend()
p1 + p2 + p3
blood.query <- blood.list[c("PB5k","BM2")]
blood.query.anchors <- FindIntegrationAnchors(object.list = blood.query, dims = 1:30)
blood.query.integrated <- IntegrateData(anchorset = blood.query.anchors, dims = 1:30)
blood.anchors <- FindTransferAnchors(reference = blood.integrated, query = blood.query.integrated, dims = 1:30)
predictions.rna <- TransferData(anchorset = blood.anchors, refdata = blood.integrated$rna_annotation, dims = 1:30)
blood.query.integrated <- AddMetaData(blood.query.integrated, metadata = predictions.rna)
blood.query.integrated <- ScaleData(blood.query.integrated, verbose = FALSE)
blood.query.integrated <- RunPCA(blood.query.integrated, npcs = 30, verbose = FALSE)
blood.query.integrated <- RunUMAP(blood.query.integrated, reduction = "pca", dims = 1:30)
DimPlot(blood.query.integrated, reduction = "umap", group.by = "predicted.id", label = TRUE, repel = TRUE) + NoLegend()
Long story short - it WORKS!! It's not Seurat, it's in my data somehow. I will go over the dims and give it a fresh start with a downsampled version, if that reproduces the error, I would send it to the email address you suggested. I will close this issue as soon as I solve it on my own or with some help.
Hi,
I am trying to apply the "Cell type classification using an integrated reference" from the Seurat Vignette https://satijalab.org/seurat/v3.2/integration.html
I integrated 4 query Seurat datasets following this Vignettes first part, retrieving an object
I then moved on to integrate 3 reference Seurat datasets into
When I try to project the reference it fails at the first step:
While the query integration still retains an ADT assay slot, even with that assay removed I encounter the same error. I do not understand what is wrong? The error log suggests that the feature loadings do not match. But if a new integrated PCA is performed on an intersect of the VariableFeatures between ref and query as I set with
features = intersect()
, then why do the individual (PCA) feature loadings of each ref and query matter? I will try to run the FindTransferAnchors on both ref and query without any prior dimreds, and if that fails will run thefeatures = intersect()
output as fixed VariableFeatures, hope that sets it. But from the Vignette it seems the "pancreas.query" did not had any VaribleFeature determined, while the ref "pancreas.integrated" had. How come those feature loadings "match" and it works?I will appreciate any helps in this matter.
Many Thanks in advance, Stephan