stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
308 stars 84 forks source link

Merging Multiome and scATAC data #1354

Closed LunaP-831 closed 1 year ago

LunaP-831 commented 1 year ago

Hello,

I have been trying to integrate my multiome object and my scATAC-seq data from 10X genomics. The multiome has already been analyzed so I was just trying to merge the multiome with scATAC data and then integrated these datasets. I am following this vignette :https://stuartlab.org/signac/articles/integrate_atac.html

However, after I run the merging command I got this error :

'names' attribute [139731] must be the same length as the vector [110805]

Then I have seen this post basically modified my scATAC object as such

ind.atac@active.ident <- factor(rep(ind.atac@meta.data$sample, length(rownames(ind.atac@meta.data)))) names(ind.atac@active.ident) <- rownames(ind.atac@meta.data)

After this, the merging took a few days without giving me any errors and was completed but then I wanted to group the merged by "dataset" and plot the information on a UMAP. However, when I tried this it gave the following error.

Error in NextMethod("[") : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535

I am wondering if there is a problem with the steps I am taking and if so how can I improve these. Thank you!

Please find my code and session info attached.

`###### load libraries library(Signac) library(Seurat) library(pbapply) library(future) library(ggplot2)

plan("multicore", workers = 86) options(future.globals.maxSize = 100 * 1024^3)

set seed

set.seed(1234)

hg38 annotation

annotation <- readRDS("/mnt/labcode/src/0.1.hg38_annotation.rds")

read the scATAC-seq aggregate counts

counts_atac <-Read10X_h5("/mnt/aggregated/outs/filtered_peak_bc_matrix.h5")

path to fragments

fragpath <- "/mnt/aggregated/outs/fragments.tsv.gz"

count fragments per cell

fragcounts <- CountFragments(fragments = fragpath)

read the file where the cell information is stored

human.atac <- read.delim("/mnt/aggregated/tables/cells_atac.txt") ##28926 cells

atac cells

cells.atac <-human.atac$barcode

create the fragment object

atac.frags <- CreateFragmentObject(path = fragpath, cells = cells.atac)

read in the multiome object

multiome <- readRDS("/mnt/multiome/Analyzed_multiome.rds")

quantify multiome peaks in the scATAC-seq dataset

counts <- FeatureMatrix( fragments = atac.frags, features = granges(multiome), cells = cells.atac )

chromatin assay

atac.assay <- CreateChromatinAssay( counts = counts, fragments = atac.frags, min.cells = 10, min.features = 200 )

Seurat object

ind.atac <- CreateSeuratObject(counts = atac.assay, assay = "peaks") #351352 28926

compute LSI

ind.atac <- FindTopFeatures(ind.atac, min.cutoff = 'q0') ind.atac <- RunTFIDF(ind.atac) ind.atac <- RunSVD(ind.atac)

add dataset info

ind.atac$dataset <- "ATAC" multiome$dataset <- "Multiome"

add sample info to the scAtac object

ind.atac@meta.data$barcode <- rownames(ind.atac@meta.data) samples_atac <- read.csv("/mnt/aggregated/outs/aggregation_csv.csv") atac.cells <- as.data.frame(row.names(ind.atac@meta.data)) colnames(atac.cells) <- "barcodes" rownames(atac.cells) <- atac.cells$barcodes atac.cells$libcodes <- as.factor(gsub(pattern=".+-", replacement="", atac.cells$barcodes)) atac.cells$samples <- as.vector(samples_atac$library_id[atac.cells$libcodes]) sampleidentity <- atac.cells["samples"] ind.atac@meta.data$sample <- sampleidentity[row.names(ind.atac@meta.data),]

meta <- ind.atac@meta.data meta$libcodes <- meta$barcode

change the numbers at the end of barcodes with gsub #to match the ordering

meta$libcodes <- ifelse(endsWith( meta$barcode,"1"),gsub("1","9", meta$barcode) , meta$libcodes) meta$libcodes <- ifelse(endsWith( meta$barcode,"2"),gsub("2","10", meta$barcode) , meta$libcodes)

fix the dataframe according to right barcodes

row.names(meta) <- meta$libcodes meta$barcode <- meta$libcodes meta$libcodes <- as.factor(gsub(pattern=".+-", replacement="", meta$barcode))

assign it back to the individual object

ind.atac@meta.data <- meta

ind.atac@active.ident <- factor(rep(ind.atac@meta.data$sample, length(rownames(ind.atac@meta.data)))) names(ind.atac@active.ident) <- rownames(ind.atac@meta.data)

merge

combined <- merge(ind.atac, multiome)

process the combined dataset

combined <- FindTopFeatures(combined, min.cutoff = 10) combined <- RunTFIDF(combined) combined <- RunSVD(combined) combined <- RunUMAP(combined, reduction = "lsi", dims = 2:50)

the step I am getting the error

figure_file<-paste0(getwd(),"scATAC_merged_dataset.pdf") pdf(figure_file, width = 15, height = 10, useDingbats=FALSE) p1 <- DimPlot(combined, group.by = "dataset") p1 dev.off() `

`

sessionInfo()

R version 4.0.5 (2021-03-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS

Matrix products: default BLAS/LAPACK: /mnt/RAID/anaconda3/envs/cloned_env/lib/libopenblasp-r0.3.20.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] future_1.27.0 pbapply_1.5-0 sp_1.5-0 SeuratObject_4.1.0 [5] Seurat_4.1.1 Signac_1.2.1

loaded via a namespace (and not attached): [1] fastmatch_1.1-3 plyr_1.8.7 igraph_1.3.0
[4] lazyeval_0.2.2 splines_4.0.5 BiocParallel_1.24.1
[7] listenv_0.8.0 scattermore_0.8 SnowballC_0.7.0
[10] GenomeInfoDb_1.26.4 ggplot2_3.3.6 digest_0.6.29
[13] htmltools_0.5.3 fansi_1.0.3 magrittr_2.0.3
[16] tensor_1.5 cluster_2.1.3 ROCR_1.0-11
[19] globals_0.15.1 Biostrings_2.58.0 matrixStats_0.62.0
[22] docopt_0.7.1 spatstat.sparse_2.1-1 colorspace_2.0-3
[25] ggrepel_0.9.1 dplyr_1.0.9 sparsesvd_0.2
[28] crayon_1.5.1 RCurl_1.98-1.7 jsonlite_1.8.0
[31] progressr_0.10.1 spatstat.data_2.2-0 survival_3.3-1
[34] zoo_1.8-10 glue_1.6.2 polyclip_1.10-0
[37] gtable_0.3.0 zlibbioc_1.36.0 XVector_0.30.0
[40] leiden_0.4.2 future.apply_1.9.0 BiocGenerics_0.36.0
[43] abind_1.4-5 scales_1.2.0 DBI_1.1.3
[46] spatstat.random_2.2-0 miniUI_0.1.1.1 Rcpp_1.0.9
[49] viridisLite_0.4.0 xtable_1.8-4 reticulate_1.25
[52] spatstat.core_2.4-4 bit_4.0.4 stats4_4.0.5
[55] htmlwidgets_1.5.4 httr_1.4.3 RColorBrewer_1.1-3
[58] ellipsis_0.3.2 ica_1.0-3 pkgconfig_2.0.3
[61] farver_2.1.1 ggseqlogo_0.1 uwot_0.1.11
[64] deldir_1.0-6 utf8_1.2.2 tidyselect_1.1.2
[67] rlang_1.0.4 reshape2_1.4.4 later_1.2.0
[70] munsell_0.5.0 tools_4.0.5 cli_3.3.0
[73] generics_0.1.3 ggridges_0.5.3 stringr_1.4.0
[76] fastmap_1.1.0 goftest_1.2-3 bit64_4.0.5
[79] fitdistrplus_1.1-8 purrr_0.3.4 RANN_2.6.1
[82] nlme_3.1-158 mime_0.12 slam_0.1-50
[85] RcppRoll_0.3.0 hdf5r_1.3.5 compiler_4.0.5
[88] plotly_4.10.0 png_0.1-7 spatstat.utils_2.3-1
[91] tibble_3.1.8 tweenr_1.0.2 stringi_1.7.6
[94] RSpectra_0.16-1 rgeos_0.5-9 lattice_0.20-45
[97] Matrix_1.4-1 vctrs_0.4.1 pillar_1.8.0
[100] lifecycle_1.0.1 spatstat.geom_2.4-0 lmtest_0.9-40
[103] RcppAnnoy_0.0.19 data.table_1.14.2 cowplot_1.1.1
[106] bitops_1.0-7 irlba_2.3.5 httpuv_1.6.5
[109] patchwork_1.1.1 GenomicRanges_1.42.0 R6_2.5.1
[112] promises_1.2.0.1 KernSmooth_2.23-20 gridExtra_2.3
[115] lsa_0.73.3 IRanges_2.24.1 parallelly_1.32.1
[118] codetools_0.2-18 MASS_7.3-58 assertthat_0.2.1
[121] qlcMatrix_0.9.7 sctransform_0.3.3 Rsamtools_2.6.0
[124] S4Vectors_0.28.1 GenomeInfoDbData_1.2.4 mgcv_1.8-40
[127] parallel_4.0.5 grid_4.0.5 rpart_4.1.16
[130] tidyr_1.2.0 Rtsne_0.16 ggforce_0.3.3
[133] shiny_1.7.2
`

timoast commented 1 year ago

ind.atac@meta.data <- meta

Never assign anything to the object using the @ operator, as this bypasses all the checks that we do when setting information in the object, and you can end up with an invalid object. Same goes for accessing information, best to avoid accessing slots directly using @. Here you should use the AddMetaData() function instead.

I suspect the metadata overwrite that you're doing is part of the problem, as you changed the rownames of that dataframe and they may not match the cell names or order used in the rest of the object.

If the merge is taking days, that's a sign that something is off. I'd suggest skipping the whole part where you're trying to change the metadata and see if that works as expected, to narrow down what the issue is. Also make sure you're using the same assay name for the ATAC object and the ATAC assay in the multiome object.

You should also update to the latest release of Signac (1.9.0)