satijalab / seurat-wrappers

Community-provided extensions to Seurat
GNU General Public License v3.0
304 stars 131 forks source link

DimProj from RunFastMNN: non-conformable arguments #25

Open dagarfield opened 4 years ago

dagarfield commented 4 years ago

I've been using RunFastMNN to align partially overlapping datasets. It works great in this context, but I run into an issue not in downstream analyses, but in downstream presentations like heat maps and other exploratory plots discussed here.

> seuratObj.mnn <- RunFastMNN(object.list = by.patient.list)
> ProjectDim(seuratObj.mnn, reduction  = "mnn", dims.print = 1:5)
Error in data.use %*% cell.embeddings : non-conformable arguments
> ProjectDim(seuratObj.mnn, reduction  = "umap", dims.print = 1:2)
Error in data.use %*% cell.embeddings : non-conformable arguments
#And the be thorough
> DimHeatmap(object = seuratObj.mnn, reduction = "mnn", dims = 1, balanced = TRUE)
Error in Loadings(object = object, projected = projected, ...)[, dim,  : 
  subscript out of bounds

But integrated objects following this approach seem to work fine:

> ProjectDim(otherSeuratObj, reduction  = "umap", dims.print = 1:2)
UMAP_ 1 
Positive:  TAGLN, JUN, DCN, DNAJB1, FOS, LUM, JUNB, IGFBP5, MYL9, EGR1 
       GADD45B, ACTA2, CYR61, CRYAB, TPM2, ATF3, HSPA6, MEG3, GEM, ADIRF 
Negative:  TMSB4X, CD74, B2M, SRGN, HLA-DRA, HLA-DRB1, IFI27, HLA-C, HLA-B, RBP1 
       TM4SF1, GSTA1, HLA-A, HLA-DPA1, TMSB10, HLA-DPB1, CXCR4, CLU, HLA-DQB1, ACKR1 
UMAP_ 2 
Positive:  B2M, TM4SF1, HLA-C, HLA-B, HLA-A, SRGN, CD74, SPARCL1, CLU, GADD45B 
       IGFBP7, ANXA1, CCL5, HSPA1A, HLA-DRB1, CXCR4, SOCS3, JUN, ACKR1, UBC 
Negative:  RBP1, GSTA1, SERPINE2, STAR, AMH, TNNI3, FHL2, MAGED2, IQCG, DCN 
       SOX4, RPL3, LUM, RPS25, RPL7, RPS18, GATM, ARID5B, RPL41, RPS8 
An object of class Seurat 
41602 features across 53746 samples within 3 assays 
Active assay: RNA (20004 features)
 2 other assays present: SCT, integrated
 3 dimensional reductions calculated: pca, umap, tsne

Any guesses where to look? It is, of course, possible to go directly to fastMNN and to construct the appropriate reduced dimensionality object. But it would be nice to use RunFastMNN....and I feel like I'm probably missing something obvious about the dimensionality of what's stored in the output object of RunFastMNN.

Thanks

benjytan88 commented 4 years ago

@dagarfield Did you manage to solve the problem? I am facing the same issue as well over here... I guess if no plausible solution is available, then constructing an appropriate reduced dimension object using FastMNN would be the only option.

dagarfield commented 4 years ago

In the end, I went to FastMNN itself (as you suggest) and constructed the object directly rather than through the Seurat wrapper. It was a bit annoying, but worked well enough in the end, and the FastMNN documentation is pretty good.

benjytan88 commented 4 years ago

@dagarfield Could you please kindly provide me your steps in constructing the proper object? I tried to do so but I still could not project my MNN dimensions. Here is my code on how I did MNN correction then convert to Seurat object:

so <- readRDS(file = paste0(output, "/PBMC/SO_merge.Rds"))

### Create SingleCellExperiment object
sce <- as.SingleCellExperiment(so)
rowData(sce) <- NULL
reducedDim(sce) <- NULL
reducedDim(sce, type = "UMAP") <- NULL

### Correct by sample ID
s11 <- sce[ , grepl("S11", sce$orig.ident)]
s12 <- sce[ , grepl("S12", sce$orig.ident)]
s13 <- sce[ , grepl("S13", sce$orig.ident)]
s14 <- sce[ , grepl("S14", sce$orig.ident)]
s15 <- sce[ , grepl("S15", sce$orig.ident)]
s16 <- sce[ , grepl("S16", sce$orig.ident)]
s18 <- sce[ , grepl("S18", sce$orig.ident)]
s19 <- sce[ , grepl("S19", sce$orig.ident)]
s20 <- sce[ , grepl("S20", sce$orig.ident)]
s21 <- sce[ , grepl("S21", sce$orig.ident)]
s22 <- sce[ , grepl("S22", sce$orig.ident)]
s23 <- sce[ , grepl("S23", sce$orig.ident)]
s24 <- sce[ , grepl("S24", sce$orig.ident)]
s25 <- sce[ , grepl("S25", sce$orig.ident)]
s26 <- sce[ , grepl("S26", sce$orig.ident)]
s27 <- sce[ , grepl("S27", sce$orig.ident)]
s28 <- sce[ , grepl("S28", sce$orig.ident)]

all.sce <- list(S11 = s11, S12 = s12, S13 = s13, S14 = s14, S15 = s15, S16 = s16,
                S18 = s18, S19 = s19, S20 = s20, S21 = s21, S22 = s22, S23 = s23,
                S24 = s24, S25 = s25, S26 = s26, S27 = s27, S28 = s28)

### Subset all batches to common universe of genes
universe <- Reduce(intersect, lapply(all.sce, rownames))
all.sce <- lapply(all.sce, "[", i = universe,)

### Adjust scaling to equalize sequencing coverage
normed.sce <- do.call(multiBatchNorm, all.sce)

### Find highly variable genes
all.var <- lapply(all.sce, modelGeneVar)
combined.var <- do.call(combineVar, all.var)
hvg.list <- rownames(combined.var)[combined.var$bio > 0]

### Correct batch effect
set.seed(920101)
mnn.sce <- do.call(fastMNN, c(normed.sce, list(subset.row = hvg.list)))

### Save computed MNN into SCE object, then convert to Seurat object
reducedDim(sce, "MNN") <- reducedDim(mnn.sce, "corrected")
so.fastmnn <- as.Seurat(sce)

Could you guide me on where I did wrong? Thank you very much!

benjytan88 commented 4 years ago

Brief update... I managed to solve the issue, although I'm not sure if this is the proper way.

The problem with ProjectDim is that it calls the data from the scale.data slot to be used for projection. However, the merged, MNN-corrected Seurat object does not have the scaled data nor variable features as mentioned in #15 .

Therefore, I saved the highly variable genes list used for MNN into the variable features slot in the Seurat object, then scaled the data. After that, I was able to project the loadings. My code is as below.

### Continue from above
so.fastmnn <- as.Seurat(sce)

### Keep highly variable genes list into Seurat object
so.fastmnn@assays$RNA@var.features <- hvg.list

### Scale data & project loadings
so.fastmnn <- ScaleData(so.fastmnn)

ProjectDim(so.fastmnn, reduction = "mnn", dims.print = 1:2, nfeatures.print = 5)

My results as below:

mnn_ 1 
Positive:  NKG7, GNLY, GZMB, FGFBP2, CST7 
Negative:  RPL32, RPL13, RPS8, RPS12, RPL39 
mnn_ 2 
Positive:  COTL1, TRBV5-1, NSMCE1, HLA-DRB5, SAT1 
Negative:  CD7, NKG7, CCL5, FGFBP2, GZMB 
An object of class Seurat 
15572 features across 93495 samples within 1 assay 
Active assay: RNA (15572 features, 13326 variable features)
 1 dimensional reduction calculated: mnn

These steps seems logical to me but I hope someone could clarify if what I did is indeed correct.

@dagarfield, did you do something similar? Could you share how you solved the issue?

Seurat developers, do my steps seems logical?

Thank you very much!