Merging Multiome and scATAC data

Hello,

I have been trying to integrate my multiome object and my scATAC-seq data from 10X genomics. The multiome has already been analyzed so I was just trying to merge the multiome with scATAC data and then integrated these datasets. I am following this vignette :https://stuartlab.org/signac/articles/integrate_atac.html

However, after I run the merging command I got this error :

'names' attribute [139731] must be the same length as the vector [110805]

Then I have seen this post basically modified my scATAC object as such

ind.atac@active.ident <- factor(rep(ind.atac@meta.data$sample, length(rownames(ind.atac@meta.data)))) names(ind.atac@active.ident) <- rownames(ind.atac@meta.data)

After this, the merging took a few days without giving me any errors and was completed but then I wanted to group the merged by "dataset" and plot the information on a UMAP. However, when I tried this it gave the following error.

Error in NextMethod("[") : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535

I am wondering if there is a problem with the steps I am taking and if so how can I improve these. Thank you!

Please find my code and session info attached.

`###### load libraries library(Signac) library(Seurat) library(pbapply) library(future) library(ggplot2)

plan("multicore", workers = 86) options(future.globals.maxSize = 100 * 1024^3)

set seed

set.seed(1234)

hg38 annotation

annotation <- readRDS("/mnt/labcode/src/0.1.hg38_annotation.rds")

read the scATAC-seq aggregate counts

counts_atac <-Read10X_h5("/mnt/aggregated/outs/filtered_peak_bc_matrix.h5")

path to fragments

fragpath <- "/mnt/aggregated/outs/fragments.tsv.gz"

count fragments per cell

fragcounts <- CountFragments(fragments = fragpath)

read the file where the cell information is stored

human.atac <- read.delim("/mnt/aggregated/tables/cells_atac.txt") ##28926 cells

atac cells

cells.atac <-human.atac$barcode

create the fragment object

atac.frags <- CreateFragmentObject(path = fragpath, cells = cells.atac)

read in the multiome object

multiome <- readRDS("/mnt/multiome/Analyzed_multiome.rds")

quantify multiome peaks in the scATAC-seq dataset

counts <- FeatureMatrix( fragments = atac.frags, features = granges(multiome), cells = cells.atac )

chromatin assay

atac.assay <- CreateChromatinAssay( counts = counts, fragments = atac.frags, min.cells = 10, min.features = 200 )

Seurat object

ind.atac <- CreateSeuratObject(counts = atac.assay, assay = "peaks") #351352 28926

compute LSI

ind.atac <- FindTopFeatures(ind.atac, min.cutoff = 'q0') ind.atac <- RunTFIDF(ind.atac) ind.atac <- RunSVD(ind.atac)

add dataset info

ind.atac$dataset <- "ATAC" multiome$dataset <- "Multiome"

add sample info to the scAtac object

ind.atac@meta.data$barcode <- rownames(ind.atac@meta.data) samples_atac <- read.csv("/mnt/aggregated/outs/aggregation_csv.csv") atac.cells <- as.data.frame(row.names(ind.atac@meta.data)) colnames(atac.cells) <- "barcodes" rownames(atac.cells) <- atac.cells$barcodes atac.cells$libcodes <- as.factor(gsub(pattern=".+-", replacement="", atac.cells$barcodes)) atac.cells$samples <- as.vector(samples_atac$library_id[atac.cells$libcodes]) sampleidentity <- atac.cells["samples"] ind.atac@meta.data$sample <- sampleidentity[row.names(ind.atac@meta.data),]

meta <- ind.atac@meta.data meta$libcodes <- meta$barcode

change the numbers at the end of barcodes with gsub #to match the ordering

meta$libcodes <- ifelse(endsWith( meta$barcode,"1"),gsub("1","9", meta$barcode) , meta$libcodes) meta$libcodes <- ifelse(endsWith( meta$barcode,"2"),gsub("2","10", meta$barcode) , meta$libcodes)

fix the dataframe according to right barcodes

row.names(meta) <- meta$libcodes meta$barcode <- meta$libcodes meta$libcodes <- as.factor(gsub(pattern=".+-", replacement="", meta$barcode))

assign it back to the individual object

ind.atac@meta.data <- meta

ind.atac@active.ident <- factor(rep(ind.atac@meta.data$sample, length(rownames(ind.atac@meta.data)))) names(ind.atac@active.ident) <- rownames(ind.atac@meta.data)

merge

combined <- merge(ind.atac, multiome)

process the combined dataset

combined <- FindTopFeatures(combined, min.cutoff = 10) combined <- RunTFIDF(combined) combined <- RunSVD(combined) combined <- RunUMAP(combined, reduction = "lsi", dims = 2:50)

the step I am getting the error

figure_file<-paste0(getwd(),"scATAC_merged_dataset.pdf") pdf(figure_file, width = 15, height = 10, useDingbats=FALSE) p1 <- DimPlot(combined, group.by = "dataset") p1 dev.off() `

sessionInfo()

R version 4.0.5 (2021-03-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS

Matrix products: default BLAS/LAPACK: /mnt/RAID/anaconda3/envs/cloned_env/lib/libopenblasp-r0.3.20.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] future_1.27.0 pbapply_1.5-0 sp_1.5-0 SeuratObject_4.1.0 [5] Seurat_4.1.1 Signac_1.2.1

loaded via a namespace (and not attached): [1] fastmatch_1.1-3 plyr_1.8.7 igraph_1.3.0
[4] lazyeval_0.2.2 splines_4.0.5 BiocParallel_1.24.1
[7] listenv_0.8.0 scattermore_0.8 SnowballC_0.7.0
[10] GenomeInfoDb_1.26.4 ggplot2_3.3.6 digest_0.6.29
[13] htmltools_0.5.3 fansi_1.0.3 magrittr_2.0.3
[16] tensor_1.5 cluster_2.1.3 ROCR_1.0-11
[19] globals_0.15.1 Biostrings_2.58.0 matrixStats_0.62.0
[22] docopt_0.7.1 spatstat.sparse_2.1-1 colorspace_2.0-3
[25] ggrepel_0.9.1 dplyr_1.0.9 sparsesvd_0.2
[28] crayon_1.5.1 RCurl_1.98-1.7 jsonlite_1.8.0
[31] progressr_0.10.1 spatstat.data_2.2-0 survival_3.3-1
[34] zoo_1.8-10 glue_1.6.2 polyclip_1.10-0
[37] gtable_0.3.0 zlibbioc_1.36.0 XVector_0.30.0
[40] leiden_0.4.2 future.apply_1.9.0 BiocGenerics_0.36.0
[43] abind_1.4-5 scales_1.2.0 DBI_1.1.3
[46] spatstat.random_2.2-0 miniUI_0.1.1.1 Rcpp_1.0.9
[49] viridisLite_0.4.0 xtable_1.8-4 reticulate_1.25
[52] spatstat.core_2.4-4 bit_4.0.4 stats4_4.0.5
[55] htmlwidgets_1.5.4 httr_1.4.3 RColorBrewer_1.1-3
[58] ellipsis_0.3.2 ica_1.0-3 pkgconfig_2.0.3
[61] farver_2.1.1 ggseqlogo_0.1 uwot_0.1.11
[64] deldir_1.0-6 utf8_1.2.2 tidyselect_1.1.2
[67] rlang_1.0.4 reshape2_1.4.4 later_1.2.0
[70] munsell_0.5.0 tools_4.0.5 cli_3.3.0
[73] generics_0.1.3 ggridges_0.5.3 stringr_1.4.0
[76] fastmap_1.1.0 goftest_1.2-3 bit64_4.0.5
[79] fitdistrplus_1.1-8 purrr_0.3.4 RANN_2.6.1
[82] nlme_3.1-158 mime_0.12 slam_0.1-50
[85] RcppRoll_0.3.0 hdf5r_1.3.5 compiler_4.0.5
[88] plotly_4.10.0 png_0.1-7 spatstat.utils_2.3-1
[91] tibble_3.1.8 tweenr_1.0.2 stringi_1.7.6
[94] RSpectra_0.16-1 rgeos_0.5-9 lattice_0.20-45
[97] Matrix_1.4-1 vctrs_0.4.1 pillar_1.8.0
[100] lifecycle_1.0.1 spatstat.geom_2.4-0 lmtest_0.9-40
[103] RcppAnnoy_0.0.19 data.table_1.14.2 cowplot_1.1.1
[106] bitops_1.0-7 irlba_2.3.5 httpuv_1.6.5
[109] patchwork_1.1.1 GenomicRanges_1.42.0 R6_2.5.1
[112] promises_1.2.0.1 KernSmooth_2.23-20 gridExtra_2.3
[115] lsa_0.73.3 IRanges_2.24.1 parallelly_1.32.1
[118] codetools_0.2-18 MASS_7.3-58 assertthat_0.2.1
[121] qlcMatrix_0.9.7 sctransform_0.3.3 Rsamtools_2.6.0
[124] S4Vectors_0.28.1 GenomeInfoDbData_1.2.4 mgcv_1.8-40
[127] parallel_4.0.5 grid_4.0.5 rpart_4.1.16
[130] tidyr_1.2.0 Rtsne_0.16 ggforce_0.3.3
[133] shiny_1.7.2
`

stuart-lab / signac