stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
330 stars 88 forks source link

Error in .subscript.2ary(x, i, , drop = TRUE) : subscript out of bounds when running FindIntegrationAnchors #1712

Closed sropri92 closed 4 months ago

sropri92 commented 6 months ago

I am running FindIntegrationAnchors to integrate my 8 scATAC datasets according to the tutorial on this site:

https://stuartlab.org/signac/articles/integrate_atac

However, I run the following commands and I get this error.

mySeurat <- RunTFIDF(mySeurat)
mySeurat <- FindTopFeatures(mySeurat, min.cutoff = 'q0')
mySeurat <- RunSVD(mySeurat,assay = 'peaks')

DepthCor(mySeurat, reduction = 'lsi', n=50) 
DefaultAssay(mySeurat) <- 'peaks'
mySeurat <- RunUMAP(object = mySeurat, reduction = 'lsi', dims = 2:37)
DimPlot(mySeurat, group.by = 'dataset', pt.size = 0.1)
gc()

# find integration anchors
integration.anchors <- FindIntegrationAnchors(
  object.list = list(seurat_BRCA1_E,seurat_BRCA1_G_a, seurat_BRCA1_G_b, seurat_BRCA2_D, seurat_WT_12,seurat_WT_B,seurat_WT_C,seurat_WT_D),
  anchor.features = rownames(seurat_BRCA1_E),
  reduction = "rlsi",
  dims = 2:37
)  # this is where I am getting the error of subscript out of bounds. Trying to solve this and will keep you posted

I have seen others having this issue but am not sure what the solution is. I have tried changing the reduction to PCA and rpca for integration and even cca and still getting the same error. I have checked and all rownames of all my seurat objects are the same as I merged the peak regions to have identical regions for all the seurat objects in my objects.list. Any help would be appreciated.

> sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] future_1.33.2             harmony_1.2.0             Rcpp_1.0.12               remotes_2.5.0            
 [5] biovizBase_1.52.0         patchwork_1.2.0           ggplot2_3.5.1             EnsDb.Hsapiens.v86_2.99.0
 [9] ensembldb_2.28.0          GenomicFeatures_1.56.0    AnnotationDbi_1.66.0      Biobase_2.64.0           
[13] Seurat_5.1.0              SeuratObject_5.0.2        sp_2.1-4                  Signac_1.13.0            
[17] AnnotationFilter_1.28.0   Rsamtools_2.20.0          Biostrings_2.72.0         XVector_0.44.0           
[21] GenomicRanges_1.56.0      GenomeInfoDb_1.40.0       IRanges_2.38.0            S4Vectors_0.42.0         
[25] BiocGenerics_0.50.0       RCurl_1.98-1.14           gridExtra_2.3            

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22            splines_4.4.0               later_1.3.2                 BiocIO_1.14.0              
  [5] bitops_1.0-7                tibble_3.2.1                polyclip_1.10-6             rpart_4.1.23               
  [9] XML_3.99-0.16.1             fastDummies_1.7.3           lifecycle_1.0.4             globals_0.16.3             
 [13] lattice_0.22-6              MASS_7.3-60.2               backports_1.4.1             magrittr_2.0.3             
 [17] rmarkdown_2.26              Hmisc_5.1-2                 plotly_4.10.4               yaml_2.3.8                 
 [21] httpuv_1.6.15               sctransform_0.4.1           spam_2.10-0                 spatstat.sparse_3.0-3      
 [25] reticulate_1.36.1           cowplot_1.1.3               pbapply_1.7-2               DBI_1.2.2                  
 [29] RColorBrewer_1.1-3          abind_1.4-5                 zlibbioc_1.50.0             Rtsne_0.17                 
 [33] purrr_1.0.2                 nnet_7.3-19                 VariantAnnotation_1.50.0    circlize_0.4.16            
 [37] GenomeInfoDbData_1.2.12     ggrepel_0.9.5               irlba_2.3.5.1               listenv_0.9.1              
 [41] spatstat.utils_3.0-4        goftest_1.2-3               RSpectra_0.16-1             spatstat.random_3.2-3      
 [45] fitdistrplus_1.1-11         parallelly_1.37.1           DelayedArray_0.30.1         leiden_0.4.3.1             
 [49] codetools_0.2-20            RcppRoll_0.3.0              tidyselect_1.2.1            shape_1.4.6.1              
 [53] farver_2.1.2                UCSC.utils_1.0.0            base64enc_0.1-3             matrixStats_1.3.0          
 [57] spatstat.explore_3.2-7      GenomicAlignments_1.40.0    jsonlite_1.8.8              Formula_1.2-5              
 [61] progressr_0.14.0            ggridges_0.5.6              survival_3.6-4              tools_4.4.0                
 [65] ica_1.0-3                   glue_1.7.0                  SparseArray_1.4.3           xfun_0.43                  
 [69] MatrixGenerics_1.16.0       dplyr_1.1.4                 withr_3.0.0                 fastmap_1.1.1              
 [73] fansi_1.0.6                 digest_0.6.35               R6_2.5.1                    mime_0.12                  
 [77] colorspace_2.1-0            scattermore_1.2             tensor_1.5                  dichromat_2.0-0.1          
 [81] spatstat.data_3.0-4         RSQLite_2.3.6               utf8_1.2.4                  tidyr_1.3.1                
 [85] generics_0.1.3              data.table_1.15.4           rtracklayer_1.64.0          httr_1.4.7                 
 [89] htmlwidgets_1.6.4           S4Arrays_1.4.0              uwot_0.2.2                  pkgconfig_2.0.3            
 [93] gtable_0.3.5                blob_1.2.4                  lmtest_0.9-40               htmltools_0.5.8.1          
 [97] dotCall64_1.1-1             ProtGenerics_1.36.0         scales_1.3.0                png_0.1-8                  
[101] knitr_1.46                  rstudioapi_0.16.0           reshape2_1.4.4              rjson_0.2.21               
[105] checkmate_2.3.1             nlme_3.1-164                curl_5.2.1                  zoo_1.8-12                 
[109] cachem_1.0.8                GlobalOptions_0.1.2         stringr_1.5.1               KernSmooth_2.23-22         
[113] parallel_4.4.0              miniUI_0.1.1.1              foreign_0.8-86              restfulr_0.0.15            
[117] pillar_1.9.0                vctrs_0.6.5                 RANN_2.6.1                  promises_1.3.0             
[121] xtable_1.8-4                cluster_2.1.6               htmlTable_2.4.2             evaluate_0.23              
[125] cli_3.6.2                   compiler_4.4.0              rlang_1.1.3                 crayon_1.5.2               
[129] future.apply_1.11.2         labeling_0.4.3              plyr_1.8.9                  stringi_1.8.4              
[133] viridisLite_0.4.2           deldir_2.0-4                BiocParallel_1.38.0         munsell_0.5.1              
[137] lazyeval_0.2.2              spatstat.geom_3.2-9         Matrix_1.7-0                BSgenome_1.72.0            
[141] RcppHNSW_0.6.0              bit64_4.0.5                 KEGGREST_1.44.0             shiny_1.8.1.1              
[145] SummarizedExperiment_1.34.0 ROCR_1.0-11                 igraph_2.0.3                memoise_2.0.1              
[149] fastmatch_1.1-4             bit_4.0.5 
timoast commented 6 months ago

Hi @sropri92, can you provide the full code you're running?

sropri92 commented 6 months ago

Yes Tim,

# List of sample names
sample_names <- c("BRCA1_E", "BRCA1_G_a", "BRCA1_G_b", "BRCA2_D","WT_12","WT_B","WT_C","WT_D" )

# Iterate over each sample
for (sample_name in sample_names) {
  # Change directory to the sample's directory
  setwd(file.path("C:/Users/sropr/OneDrive/Documents/scATAC/carmans/", sample_name))

  peaks <- read.table(
    file = "peaks.bed",
    col.names = c("chr", "start", "end")
  )

  # Make sure the Seurat object for each sample has a unique name
  assign(paste0("peaks_", sample_name), peaks)
}

# convert to genomic ranges
peaks_BRCA1_E <- makeGRangesFromDataFrame(peaks_BRCA1_E)
peaks_BRCA1_G_a <- makeGRangesFromDataFrame(peaks_BRCA1_G_a)
peaks_BRCA1_G_b <- makeGRangesFromDataFrame(peaks_BRCA1_G_b)
peaks_BRCA2_D <- makeGRangesFromDataFrame(peaks_BRCA2_D)
peaks_WT_12 <- makeGRangesFromDataFrame(peaks_WT_12)
peaks_WT_B <- makeGRangesFromDataFrame(peaks_WT_B)
peaks_WT_C <- makeGRangesFromDataFrame(peaks_WT_C)
peaks_WT_D <- makeGRangesFromDataFrame(peaks_WT_D)

# Create a unified set of peaks to quantify in each dataset
combined.peaks <- reduce(x = c(peaks_BRCA1_E,peaks_BRCA1_G_a,peaks_BRCA1_G_b,peaks_BRCA2_D,peaks_WT_12,peaks_WT_B,
                               peaks_WT_C,peaks_WT_D))

# Filter out bad peaks based on length
peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths  < 10000 & peakwidths > 20]
combined.peaks

for (sample_name in sample_names) {
  # Change directory to the sample's directory
  setwd(file.path("C:/Users/sropr/OneDrive/Documents/scATAC/carmans/", sample_name))

  md <- read.table(
    file = "singlecell.csv",
    stringsAsFactors = FALSE,
    sep = ",",
    header = TRUE,
    row.names = 1
  )[-1, ] # remove the first row

  md <- md[md$passed_filters > 500, ] # Ali: can adjust accordingly, might be better to ignore this and do the filtering after getting the Violin Plot

  frags <- CreateFragmentObject(
    path = "fragments.tsv.gz",
    cells = rownames(md)
  )

  counts <- FeatureMatrix(
    fragments = frags,
    features = combined.peaks,
    cells = rownames(md)
  )

  assay <- CreateChromatinAssay(counts, fragments = frags)
  seurat <- CreateSeuratObject(assay, assay = "peaks", meta.data=md, project = sample_name)

  # extract gene annotations from EnsDb
  annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)

  # change to UCSC style since the data was mapped to hg38
  seqlevels(annotations) <- paste0('chr', seqlevels(annotations))
  genome(annotations) <- "hg38"

  # add the gene information to the object
  Annotation(seurat) <- annotations

  head(Annotation(seurat))
  head(Fragments(seurat)[[1]])

  # compute nucleosome signal score per cell
  seurat <- NucleosomeSignal(object = seurat)

  # compute TSS enrichment score per cell
  seurat <- TSSEnrichment(object = seurat, fast = FALSE) # It used to be fast=TRUE which did not allow me to build the enrichment matrix

  # add blacklist ratio and fraction of reads in peaks
  seurat$pct_reads_in_peaks <- seurat$peak_region_fragments / seurat$passed_filters * 100

# Blacklist fraction function
  seurat$blacklist_fraction <- FractionCountsInRegion(
    object = seurat, 
    assay = 'peaks',
    regions = blacklist_hg38
  )

  #DensityScatter(seurat, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)

  seurat$high.tss <- ifelse(seurat$TSS.enrichment > 3, 'High', 'Low')
  #TSSPlot(seurat, group.by = 'high.tss') + NoLegend()

  seurat$nucleosome_group <- ifelse(seurat$nucleosome_signal > 3, 'NS > 3', 'NS < 3')
  #FragmentHistogram(object = seurat, group.by = 'nucleosome_group')

  seurat <- subset(
    x = seurat,
    subset = nCount_peaks > 100 &
    nCount_peaks < 30000 &
    pct_reads_in_peaks > 10 &
    blacklist_fraction < 0.1 &
    nucleosome_signal < 4 &
    TSS.enrichment > 1
  )

  #seurat <- RunTFIDF(seurat)
  #seurat <- FindTopFeatures(seurat, min.cutoff = 10)
  #seurat <- RunSVD(seurat ,assay = 'peaks')

  # Make sure the Seurat object for each sample has a unique name
  assign(paste0("seurat_", sample_name), seurat)
}

seurat_BRCA1_E$dataset <- sample_names[1]
seurat_BRCA1_G_a$dataset <- sample_names[2]
seurat_BRCA1_G_b$dataset <- sample_names[3]
seurat_BRCA2_D$dataset <- sample_names[4]
seurat_WT_12$dataset <- sample_names[5]
seurat_WT_B$dataset <- sample_names[6]
seurat_WT_C$dataset <- sample_names[7]
seurat_WT_D$dataset <- sample_names[8]

colnames(seurat_BRCA1_E) <- paste0(sample_names[1], "_", colnames(seurat_BRCA1_E))
colnames(seurat_BRCA1_G_a) <- paste0(sample_names[2], "_", colnames(seurat_BRCA1_G_a))
colnames(seurat_BRCA1_G_b) <- paste0(sample_names[3], "_", colnames(seurat_BRCA1_G_b))
colnames(seurat_BRCA2_D) <- paste0(sample_names[4], "_", colnames(seurat_BRCA2_D))
colnames(seurat_WT_12) <- paste0(sample_names[5], "_", colnames(seurat_WT_12))
colnames(seurat_WT_B) <- paste0(sample_names[6], "_", colnames(seurat_WT_B))
colnames(seurat_WT_C) <- paste0(sample_names[7], "_", colnames(seurat_WT_C))
colnames(seurat_WT_D) <- paste0(sample_names[8], "_", colnames(seurat_WT_D))

# merge all datasets, adding a cell ID to make sure cell names are unique
combined <- merge(
  x = seurat_BRCA1_E,
  y = list(seurat_BRCA1_G_a, seurat_BRCA1_G_b, seurat_BRCA2_D, seurat_WT_12,seurat_WT_B,seurat_WT_C,seurat_WT_D)
)
combined[["peaks"]]

mySeurat <- combined
rm(combined)
gc()

#mySeurat1 <- subset(
  #x = mySeurat,
  #subset = nCount_peaks > 100 &
    #nCount_peaks < 30000 &
    #pct_reads_in_peaks > 10 &
    #blacklist_fraction < 0.1 &
    #nucleosome_signal < 4 &
    #TSS.enrichment > 1
#)

#mySeurat1

mySeurat <- RunTFIDF(mySeurat)
mySeurat <- FindTopFeatures(mySeurat, min.cutoff = 'q0')
mySeurat <- RunSVD(mySeurat,assay = 'peaks')

DepthCor(mySeurat, reduction = 'lsi', n=50) # 
DefaultAssay(mySeurat) <- 'peaks'
mySeurat <- RunUMAP(object = mySeurat, reduction = 'lsi', dims = 2:37)
DimPlot(mySeurat, group.by = 'dataset', pt.size = 0.1)
gc()

# find integration anchors
integration.anchors <- FindIntegrationAnchors(
  object.list = list(seurat_BRCA1_E,seurat_BRCA1_G_a, seurat_BRCA1_G_b, seurat_BRCA2_D, seurat_WT_12,seurat_WT_B,seurat_WT_C,seurat_WT_D),
  anchor.features = rownames(seurat_BRCA1_E),
  reduction = "rlsi",
  dims = 2:37
)  # this is where I am getting the error of subscript out of bounds. Trying to solve this and will keep you posted

gc()
timoast commented 6 months ago

I think you need to uncomment those lines:

  #seurat <- RunTFIDF(seurat)
  #seurat <- FindTopFeatures(seurat, min.cutoff = 10)
  #seurat <- RunSVD(seurat ,assay = 'peaks')

so that LSI is computed for each object, which is needed for rlsi in the integration

sropri92 commented 6 months ago

Hi Tim,

I actually had those uncommented when running my original code. These were just commented by me now. Do you think not running the reductions on each individual object might work and trying the rlsi reductions in the FindIntegrationAnchors instead?

sropri92 commented 6 months ago

Sorry for the confusion, but the rest of the code I ran as is.

timoast commented 6 months ago

Ok, I don't see any issues with the code then. This might be related to a bug in Seurat, the error message is similar to that reported in this issue: https://github.com/satijalab/seurat/issues/8561

I'd suggest opening an issue in Seurat with a reproducible example

sropri92 commented 6 months ago

Ok, thank you for taking the time to look this over. Will open the issue with seurat and see if that helps.

OneHitKO commented 5 months ago

I had a related error, and was able to solve it by removing the Motif Object (set the seurObj[["ChrAssay"]]@motifs = NULL). Maybe that would help? Afterwards, you can add that motif object back to the seurat object again.

timoast commented 5 months ago

Thanks @OneHitKO that's good to know!

@sropri92 did you have motifs stored in your object? If so you could try removing and see if that could be a workaround until the issue in Seurat is fixed

OneHitKO commented 5 months ago

@timoast, since we're on the topic of MotifObject, would you be able to quickly check issue #1657 ? I'm not sure if the way I had added/created it led to downstream subsetting problems in the first place. Thanks for your help (and for developing a great package)!

timoast commented 4 months ago

Going to close this now as it should be an issue with Seurat rather than Signac