satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 909 forks source link

Error with FindTransferAnchors while doing integrated analysis #2889

Closed kpadm closed 4 years ago

kpadm commented 4 years ago

Hello,

I am doing an integrated analysis of scRNA-seq and scATAC-seq. I am getting the following error at the FindTransferAnchors step. Here is the detailed error:

Error in if (length(x = features)/nfeatures < 0.1 & verbose) { :
  missing value where TRUE/FALSE needed
In addition: Warning message:
In RunCCA.Seurat(object1 = reference, object2 = query, features = features,  :
  Running CCA on different assays

Here is my entire code:

library(Signac)
library(Seurat)
library(GenomeInfoDb)
library(EnsDb.Mmusculus.v79)
library(ggplot2)
library(patchwork)
set.seed(1234)
library(hdf5r)

##### pre processing

# read the peak/cell matrix
counts_day0 = Read10X_h5(filename = "raw_data/Sample_352-AH-1-ATAC/outs/raw_peak_bc_matrix.h5")
counts_day7 = Read10X_h5(filename = "raw_data/Sample_352-AH-2-ATAC/outs/raw_peak_bc_matrix.h5")
counts_day42 = Read10X_h5(filename = "raw_data/Sample_352-AH-6-ATAC/outs/raw_peak_bc_matrix.h5")

# read metadata
metadata_day0 = read.csv(file="raw_data/Sample_352-AH-1-ATAC/outs/singlecell.csv", header=TRUE, row.names = 1)
metadata_day7 = read.csv(file="raw_data/Sample_352-AH-2-ATAC/outs/singlecell.csv", header=TRUE, row.names = 1)
metadata_day42 = read.csv(file="raw_data/Sample_352-AH-6-ATAC/outs/singlecell.csv", header=TRUE, row.names = 1)

# msc objects
# filter based on at least 100 features per cell
msc_day0 = CreateSeuratObject(counts = counts_day0, assay = 'peaks', project = 'ATAC', min.cells = 1, min.features = 100,
                                     meta.data = metadata_day0)
msc_day7 = CreateSeuratObject(counts = counts_day7, assay = 'peaks', project = 'ATAC', min.cells = 1, min.features = 100,
                                     meta.data = metadata_day7)
msc_day42 = CreateSeuratObject(counts = counts_day42, assay = 'peaks', project = 'ATAC', min.cells = 1, min.features = 100,
                                     meta.data = metadata_day42)

# read fragment files and store path in associated msc objects
fragments_day0 = "raw_data/Sample_352-AH-1-ATAC/outs/fragments.tsv.gz"
fragments_day7 = "raw_data/Sample_352-AH-2-ATAC/outs/fragments.tsv.gz"
fragments_day42 = "raw_data/Sample_352-AH-6-ATAC/outs/fragments.tsv.gz"

msc_day0 = SetFragments(object = msc_day0, file = fragments_day0)
msc_day7 = SetFragments(object = msc_day7, file = fragments_day7)
msc_day42 = SetFragments(object = msc_day42, file = fragments_day42)

# merge seurat objects, see - https://satijalab.org/signac/articles/merging.html
combined.peaks <- UnifyPeaks(object.list = list(msc_day0, msc_day7, msc_day42), mode = "reduce")

combined.peaks

# quantify peaks in each dataset
msc_day0_counts <- FeatureMatrix(
  fragments = GetFragments(msc_day0),
  features = combined.peaks,
  sep = c(":", "-"),
  cells = colnames(msc_day0)
)

msc_day7_counts <- FeatureMatrix(
  fragments = GetFragments(msc_day7),
  features = combined.peaks,
  sep = c(":", "-"),
  cells = colnames(msc_day7)
)

msc_day42_counts <- FeatureMatrix(
  fragments = GetFragments(msc_day42),
  features = combined.peaks,
  sep = c(":", "-"),
  cells = colnames(msc_day42)
)

msc_day0[['peaks']] <- CreateAssayObject(counts = msc_day0_counts)
msc_day7[['peaks']] <- CreateAssayObject(counts = msc_day7_counts)
msc_day42[['peaks']] <- CreateAssayObject(counts = msc_day42_counts)

# merge objects
# Now that the objects each contain an assay with the same set of features,
# we can use the standard merge function from Seurat to merge the objects.

# add information to identify dataset of origin
msc_day0$dataset <- 'msc_day0'
msc_day7$dataset <- 'msc_day7'
msc_day42$dataset <- 'msc_day42'

# merge all datasets, adding a cell ID to make sure cell names are unique
combined <- merge(x = msc_day0, y = list(msc_day7, msc_day42),
                  add.cell.ids = c("day0", "day7", "day42"))

# make sure to change to the assay containing common peaks
#DefaultAssay(combined) <- "peaks"
DefaultAssay(combined) <- "peaks"
combined <- RunTFIDF(combined)
combined <- FindTopFeatures(combined, min.cutoff = 20)
combined <- RunSVD(
  combined,
  reduction.key = 'LSI_',
  reduction.name = 'lsi',
  irlba.work = 400
)
combined <- RunUMAP(combined, dims = 2:30, reduction = 'lsi')

DimPlot(combined, group.by = 'dataset', pt.size = 0.1)

# merge the fragments file using bash, refer here - https://satijalab.org/signac/articles/merging.html#merge-fragment-files

combined <- SetFragments(combined, "fragments.tsv.gz")

fragment.path = "fragments.tsv.gz"

# make coverage plot
pdf("analysis_03_24/plots/coverageplot.pdf")
CoveragePlot(
  object = combined,
  group.by = 'dataset',
  region = "chr14-99700000-99760000",
  peaks = StringToGRanges(rownames(combined), sep = c(":", "-"))
)
dev.off()
# create a gene activity matrix

#extract gene coordinates from Ensembl, and ensure name formatting is consistent with  Seurat object
gene.coords <- genes(EnsDb.Mmusculus.v79, filter = ~ gene_biotype == "protein_coding")
seqlevelsStyle(gene.coords) <- 'UCSC'
genebody.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')
genebodyandpromoter.coords <- Extend(x = gene.coords, upstream = 2000, downstream = 0)

# build a gene by cell matrix
gene.activities <- FeatureMatrix(
  fragments = fragment.path,
  features = genebodyandpromoter.coords,
  cells = colnames(combined),
  chunk = 10
)

#Add the gene activity matrix to the Seurat object as a new assay, and normalize it
combined[['RNA']] <- CreateAssayObject(counts = gene.activities)
combined <- NormalizeData(
  object = combined,
  assay = 'RNA',
  normalization.method = 'LogNormalize',
  scale.factor = median(combined$nCount_RNA)
)

DefaultAssay(combined) <- 'RNA'

##### Integrating with scRNA-seq data
# Load the pre-processed scRNA-seq data
msc.rna <- readRDS("scRNASeq_clusters/MSC.sets.integrated.new_run.50.rds")
msc.rna$tech <- "rna"
DefaultAssay(msc.rna) = 'RNA'

pdf("analysis_03_24/plots/combined_dimplot.pdf")
p1 <- DimPlot(combined, reduction = "umap") + NoLegend() + ggtitle("scATAC-seq")
p2 <- DimPlot(msc.rna, group.by = "SingleR", label = TRUE, repel = TRUE) + NoLegend() + ggtitle("scRNA-seq")
CombinePlots(plots = list(p1, p2))
dev.off()

transfer.anchors <- FindTransferAnchors(reference = msc.rna, query = combined, features = VariableFeatures(object = msc.rna),
    reference.assay = "RNA", query.assay = "peaks", reduction = "cca")

Can you please let me know if I am doing something wrong? Here is my sessionInfo().

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] hdf5r_1.3.2                patchwork_1.0.0
 [3] ggplot2_3.3.0              EnsDb.Mmusculus.v79_2.99.0
 [5] ensembldb_2.10.2           AnnotationFilter_1.10.0
 [7] GenomicFeatures_1.38.2     AnnotationDbi_1.48.0
 [9] Biobase_2.46.0             GenomicRanges_1.38.0
[11] GenomeInfoDb_1.22.1        IRanges_2.20.2
[13] S4Vectors_0.24.4           BiocGenerics_0.32.0
[15] Signac_0.2.5               Seurat_3.1.5
[17] BiocManager_1.30.10

loaded via a namespace (and not attached):
  [1] Rtsne_0.15                  colorspace_1.4-1
  [3] ellipsis_0.3.0              ggridges_0.5.2
  [5] XVector_0.26.0              leiden_0.3.3
  [7] listenv_0.8.0               npsurv_0.4-0
  [9] ggrepel_0.8.2               ggfittext_0.8.1
 [11] bit64_0.9-7                 codetools_0.2-16
 [13] splines_3.6.3               lsei_1.2-0
 [15] jsonlite_1.6.1              Rsamtools_2.2.3
 [17] ica_1.0-2                   dbplyr_1.4.3
 [19] cluster_2.1.0               png_0.1-7
 [21] uwot_0.1.8                  sctransform_0.2.1
 [23] compiler_3.6.3              httr_1.4.1
 [25] assertthat_0.2.1            Matrix_1.2-18
 [27] lazyeval_0.2.2              htmltools_0.4.0
 [29] prettyunits_1.1.1           tools_3.6.3
 [31] rsvd_1.0.3                  igraph_1.2.5
 [33] gtable_0.3.0                glue_1.4.0
 [35] GenomeInfoDbData_1.2.2      RANN_2.6.1
 [37] reshape2_1.4.4              dplyr_0.8.5
 [39] rappdirs_0.3.1              Rcpp_1.0.4.6
 [41] vctrs_0.2.4                 Biostrings_2.54.0
 [43] gdata_2.18.0                ape_5.3
 [45] nlme_3.1-147                rtracklayer_1.46.0
 [47] ggseqlogo_0.1               lmtest_0.9-37
 [49] stringr_1.4.0               globals_0.12.5
 [51] lifecycle_0.2.0             irlba_2.3.3
 [53] gtools_3.8.2                XML_3.99-0.3
 [55] future_1.17.0               MASS_7.3-51.5
 [57] zlibbioc_1.32.0             zoo_1.8-7
 [59] scales_1.1.0                ProtGenerics_1.18.0
 [61] hms_0.5.3                   SummarizedExperiment_1.16.1
 [63] RColorBrewer_1.1-2          gggenes_0.4.0
[65] curl_4.3                    memoise_1.1.0
 [67] reticulate_1.15             pbapply_1.4-2
 [69] gridExtra_2.3               biomaRt_2.42.1
 [71] stringi_1.4.6               RSQLite_2.2.0
 [73] caTools_1.18.0              BiocParallel_1.20.1
 [75] matrixStats_0.56.0          rlang_0.4.5
 [77] pkgconfig_2.0.3             bitops_1.0-6
 [79] lattice_0.20-41             ROCR_1.0-7
 [81] purrr_0.3.4                 GenomicAlignments_1.22.1
 [83] htmlwidgets_1.5.1           cowplot_1.0.0
 [85] bit_1.1-15.2                tidyselect_1.0.0
 [87] RcppAnnoy_0.0.16            plyr_1.8.6
 [89] magrittr_1.5                R6_2.4.1
 [91] gplots_3.0.3                DelayedArray_0.12.3
 [93] DBI_1.1.0                   withr_2.2.0
 [95] pillar_1.4.3                fitdistrplus_1.0-14
 [97] survival_3.1-12             RCurl_1.98-1.2
 [99] tibble_3.0.1                future.apply_1.5.0
[101] tsne_0.1-3                  crayon_1.3.4
[103] KernSmooth_2.23-16          BiocFileCache_1.10.2
[105] plotly_4.9.2.1              progress_1.2.2
[107] grid_3.6.3                  data.table_1.12.8
[109] blob_1.2.1                  digest_0.6.25
[111] tidyr_1.0.2                 openssl_1.4.1
[113] munsell_0.5.0               viridisLite_0.3.0
[115] askpass_1.1

Thanks.

Karthik

timoast commented 4 years ago

I think the problem here is that the row names of the gene activity matrix will still be the chromosomal coordinates rather than the gene names, so there are no features in common between the gene activity and gene expression matrices. You can change the row names like this:

gene.key <- genebodyandpromoter.coords$gene_name
names(gene.key) <- GRangesToString(grange = genebodyandpromoter.coords)
rownames(gene.activities) <- gene.key[rownames(gene.activities)]

See https://satijalab.org/signac/articles/pbmc_vignette.html#create-a-gene-activity-matrix-1

kpadm commented 4 years ago

Thanks for your reply Tim. After changing the row names, I am getting this error:

> combined[['RNA']] <- CreateAssayObject(counts = gene.activities)
Warning: Non-unique features (rownames) present in the input matrix, making unique
Error: Feature names of counts matrix cannot be empty
timoast commented 4 years ago

You probably have some empty gene names, you can remove them:

gene.key <- genebodyandpromoter.coords$gene_name
names(gene.key) <- GRangesToString(grange = genebodyandpromoter.coords)
rownames(gene.activities) <- make.unique(gene.key[rownames(gene.activities)])
gene.activities <- gene.activities[rownames(gene.activities)!="",]
kpadm commented 4 years ago

Thanks that seems to have fixed it. Appreciate all your help.