stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
327 stars 88 forks source link

Region Matrix won't work #1076

Closed masonsweat closed 2 years ago

masonsweat commented 2 years ago

Hi Tim, Thanks in advance...

Essentially I cannot get region matrix to work. Here is the code I used to generate my Seurat object before trying to use feature matrix:

annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
seqlevelsStyle(annotation) <- "UCSC"

peaks.AF1 <- read.table(
  file = "AF1_atac_peaks.bed",
  col.names = c("chr", "start", "end")
)
#I then filtered by peak width. This dataset contains 4 objects but I am only showing one. 

peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths  < 10000 & peakwidths > 20]
combined.peaks
# Read metadata
md.AF1 <- read.table(
  file = "AF1_per_barcode_metrics.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-2, ] 

# Filter
md.AF1 <- md.AF1[md.AF1$atac_raw_reads > 250, ]

# create fragment objects
frags.AF1 <- CreateFragmentObject(
  path = "AF1_atac_fragments.tsv.gz",
  cells = rownames(md.AF1)
)
# Make counts using combined peak set
AF1.counts <- FeatureMatrix(
  fragments = frags.AF1,
  features = combined.peaks,
  cells = rownames(md.AF1)
)

AF1_assay <- CreateChromatinAssay(AF1.counts, fragments = frags.AF1, annotation = annotation, min.features = -1)
AF1 <- CreateSeuratObject(AF1_assay, assay = "ATAC", meta.data=md.AF1)
AF1[["RNA"]]<- CreateAssayObject(counts=counts$'Gene Expression'[,colnames(AF1)])
AF1

Output:
An object of class Seurat 
178393 features across 2283 samples within 2 assays 
Active assay: ATAC (146108 features, 0 variable features)
 1 other assay present: RNA

## I then performed qc, and merged 4 different seurats together. 
## I went through the weighted nearest neighbors vignett to identify clusters, make reductions, ect. 
## Then I try to do the region matrix to look at the signal within a few select clusters of cells:

# First make a granges object
bed <- read.table('filepath', sep = '\t', header = FALSE)
names(bed)[1] <- 'chromosome'
names(bed)[2] <- 'start'
names(bed)[3] <- 'end'
newrange <- makeGRangesFromDataFrame(bed, start.field = 'start', end.field = 'end',
      seqinfo = c('chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10',
                         'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18',
                         'chr19', 'chr20', 'chr21', 'chrX'), seqnames.field = 'chromosome' )
newrange 

Output:
GRanges object with 1112 ranges and 0 metadata columns:
         seqnames              ranges strand
            <Rle>           <IRanges>  <Rle>
     [1]     chr1 130803960-130804946      *
     [2]     chr1     9858933-9859881      *
     [3]     chr1   77048816-77049875      *
     [4]     chr1   79631214-79632159      *
     [5]     chr1   92343525-92344341      *
     ...      ...                 ...    ...
  [1108]    chr19   10433720-10434542      *
  [1109]    chr19   47561819-47562748      *
  [1110]    chr19   40368233-40369160      *
  [1111]    chr19     5821966-5822899      *
  [1112]     chrX 159942321-159943395      *
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths

## do regionmatrix:
idents.plot <- c("x1", "x2", "x3", 'x4', "x5", "x6")

DefaultAssay(Seurat)<- 'ATAC'
obj <-RegionMatrix(
  Seurat,
  newrange,
  key = 'myregions',
  assay = NULL,
  idents = idents.plot,
  upstream = 1000,
  downstream = 1000,
  verbose = TRUE,
)
obj

output:
An object of class Seurat 
200391 features across 14569 samples within 4 assays 
Active assay: ATAC (146056 features, 146056 variable features)
 3 other assays present: RNA, chromvar, SCT
 6 dimensional reductions calculated: pca, umap.rna, lsi, umap.atac, wnn.umap, umap
## Then I try RegionHeatMap

RegionHeatmap(
  obj,
  key = 'myregions',
  assay = 'ATAC',
  idents = idents.plot,
  normalize = TRUE,
  upstream = 1000,
  downstream = 1000,
  max.cutoff = "q95",
  cols = NULL,
  min.counts = 1,
  window = (2000)/30,
  order = TRUE,
  nrow = NULL
)

This returns 0 counts for any of the identities, although I know that isn't the case because there are fragments in the coverageplot. An example:
CoveragePlot(Seurat, 'chr1-130803960-130804946', idents = idents.plot)

image

masonsweat commented 2 years ago

Full session info: R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] dplyr_1.0.8 SeuratDisk_0.0.0.9019
[3] stringr_1.4.0 BSgenome.Mmusculus.UCSC.mm10_1.4.3 [5] BSgenome_1.62.0 rtracklayer_1.54.0
[7] Biostrings_2.62.0 XVector_0.34.0
[9] TFBSTools_1.32.0 JASPAR2020_0.99.10
[11] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.18.4
[13] AnnotationFilter_1.18.0 GenomicFeatures_1.46.5
[15] AnnotationDbi_1.56.2 cicero_1.3.5
[17] Gviz_1.38.4 shiny_1.7.1
[19] patchwork_1.1.1 ggplot2_3.3.5
[21] Matrix_1.4-1 monocle3_1.0.0
[23] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[25] MatrixGenerics_1.6.0 matrixStats_0.61.0
[27] Biobase_2.54.0 SeuratWrappers_0.3.0
[29] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[31] IRanges_2.28.0 S4Vectors_0.32.4
[33] BiocGenerics_0.40.0 SeuratObject_4.0.4
[35] Seurat_4.1.0 Signac_1.6.0.9010

loaded via a namespace (and not attached): [1] rappdirs_0.3.3 SnowballC_0.7.0 scattermore_0.8
[4] R.methodsS3_1.8.1 tidyr_1.2.0 bit64_4.0.5
[7] knitr_1.38 irlba_2.3.5 DelayedArray_0.20.0
[10] R.utils_2.11.0 data.table_1.14.2 rpart_4.1.16
[13] KEGGREST_1.34.0 RCurl_1.98-1.6 generics_0.1.2
[16] callr_3.7.0 leidenbase_0.1.9 cowplot_1.1.1
[19] RSQLite_2.2.12 RANN_2.6.1 VGAM_1.1-6
[22] proxy_0.4-26 future_1.24.0 tzdb_0.3.0
[25] bit_4.0.4 spatstat.data_2.1-4 xml2_1.3.3
[28] httpuv_1.6.5 assertthat_0.2.1 DirichletMultinomial_1.36.0 [31] viridis_0.6.2 xfun_0.30 hms_1.1.1
[34] jquerylib_0.1.4 promises_1.2.0.1 fansi_1.0.3
[37] restfulr_0.0.13 progress_1.2.2 caTools_1.18.2
[40] dbplyr_2.1.1 igraph_1.3.0 DBI_1.1.2
[43] htmlwidgets_1.5.4 sparsesvd_0.2 spatstat.geom_2.4-0
[46] purrr_0.3.4 ellipsis_0.3.2 backports_1.4.1
[49] annotate_1.72.0 biomaRt_2.50.3 deldir_1.0-6
[52] vctrs_0.4.0 remotes_2.4.2 ROCR_1.0-11
[55] abind_1.4-5 cachem_1.0.6 withr_2.5.0
[58] checkmate_2.0.0 sctransform_0.3.3 GenomicAlignments_1.30.0
[61] prettyunits_1.1.1 goftest_1.2-3 cluster_2.1.3
[64] seqLogo_1.60.0 lazyeval_0.2.2 crayon_1.5.1
[67] hdf5r_1.3.5 pkgconfig_2.0.3 slam_0.1-50
[70] labeling_0.4.2 vipor_0.4.5 nlme_3.1-157
[73] ProtGenerics_1.26.0 nnet_7.3-17 rlang_1.0.2
[76] globals_0.14.0 lifecycle_1.0.1 miniUI_0.1.1.1
[79] filelock_1.0.2 BiocFileCache_2.2.1 rsvd_1.0.5
[82] dichromat_2.0-0 ggrastr_1.0.1 rprojroot_2.0.3
[85] polyclip_1.10-0 lmtest_0.9-40 ggseqlogo_0.1
[88] zoo_1.8-9 beeswarm_0.4.0 base64enc_0.1-3
[91] ggridges_0.5.3 processx_3.5.3 png_0.1-7
[94] viridisLite_0.4.0 rjson_0.2.21 bitops_1.0-7
[97] R.oo_1.24.0 KernSmooth_2.23-20 blob_1.2.3
[100] parallelly_1.31.0 spatstat.random_2.2-0 readr_2.1.2
[103] jpeg_0.1-9 CNEr_1.30.0 scales_1.1.1
[106] memoise_2.0.1 magrittr_2.0.3 plyr_1.8.7
[109] ica_1.0-2 zlibbioc_1.40.0 compiler_4.1.2
[112] tinytex_0.38 BiocIO_1.4.0 RColorBrewer_1.1-3
[115] fitdistrplus_1.1-8 Rsamtools_2.10.0 cli_3.2.0
[118] listenv_0.8.0 pbapply_1.5-0 ps_1.6.0
[121] htmlTable_2.4.0 Formula_1.2-4 MASS_7.3-56
[124] mgcv_1.8-40 tidyselect_1.1.2 stringi_1.7.6
[127] yaml_2.3.5 latticeExtra_0.6-29 ggrepel_0.9.1
[130] sass_0.4.1 VariantAnnotation_1.40.0 fastmatch_1.1-3
[133] tools_4.1.2 future.apply_1.8.1 parallel_4.1.2
[136] rstudioapi_0.13 TFMPvalue_0.0.8 foreign_0.8-82
[139] lsa_0.73.2 gridExtra_2.3 farver_2.1.0
[142] Rtsne_0.15 digest_0.6.29 BiocManager_1.30.16
[145] pracma_2.3.8 FNN_1.1.3 qlcMatrix_0.9.7
[148] Rcpp_1.0.8.3 later_1.3.0 RcppAnnoy_0.0.19
[151] httr_1.4.2 biovizBase_1.42.0 colorspace_2.0-3
[154] XML_3.99-0.9 tensor_1.5 reticulate_1.24
[157] splines_4.1.2 uwot_0.1.11 RcppRoll_0.3.0
[160] spatstat.utils_2.3-0 plotly_4.10.0 xtable_1.8-4
[163] poweRlaw_0.70.6 jsonlite_1.8.0 R6_2.5.1
[166] Hmisc_4.6-0 pillar_1.7.0 htmltools_0.5.2
[169] mime_0.12 glue_1.6.2 fastmap_1.1.0
[172] BiocParallel_1.28.3 codetools_0.2-18 pkgbuild_1.3.1
[175] utf8_1.2.2 lattice_0.20-45 bslib_0.3.1
[178] spatstat.sparse_2.1-0 tibble_3.1.6 ggbeeswarm_0.6.0
[181] curl_4.3.2 leiden_0.3.9 gtools_3.9.2
[184] GO.db_3.14.0 survival_3.3-1 docopt_0.7.1
[187] munsell_0.5.0 GenomeInfoDbData_1.2.7 reshape2_1.4.4
[190] gtable_0.3.0 spatstat.core_2.4-2

timoast commented 2 years ago

I wasn't able to reproduce this issue, are you able to provide an example using any of the datasets used in the Signac vignettes that recreates it?

masonsweat commented 2 years ago

I will try that and get back to you. Thanks!

On Apr 15, 2022, at 4:45 PM, Tim Stuart @.***> wrote:

 I wasn't able to reproduce this issue, are you able to provide an example using any of the datasets used in the Signac vignettes that recreates it?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

masonsweat commented 2 years ago

Hi Tim, Following your suggestion, I figured out how to reproduce the error! Could you please help me fix it, when you have time? :) (You are the greatest!!!).

I used the pbmc500 dataset to show that RegionMatrix works. I followed the merge vignette (https://satijalab.org/signac/articles/merging.html) to get the data, but did not include any other datasets.

Comparing my dataset to pbmc500, i couldn't find many differences. I removed data and fragment files but nothing made my object work. The only difference between my object and pbmc500 was that my object originated from merging 4 seurat objects.

To test the idea that merging somehow makes a difference, I tried Regionmatrix on my unmerged seurat objects. Actually, this worked for me. All 4 of the seurat objects could be used to make good RegionHeatmap plots!

I wanted to see if the error was exclusive to my datasets or if it could be reproduced using 10x datasets. To reproduce the error, I then followed the merge vignette and merged pbmc500 with pbmc1000.

After merging pbmc500 with pbmc1000, RegionMatrix no longer functions, and I get the same result as when I use it with my data (Regionheatmap shows 0 fragments in the specified regions).

In conclusion, for some reason RegionMatrix fails when applied to merged seurats.

Here is the code I used to merge pbmc500 and pbmc1000, which should reproduce the error. Session info is the same as listed above.

Again, thank you very much with your support for this issue.

Best, Mason

Code:

peaks.500 <- read.table( file = "Pbmc_500cell/atac_pbmc_500_nextgem_peaks.bed", col.names = c("chr", "start", "end") ) peaks.1k <- read.table( file = "atac_pbmc_1k_nextgem_peaks.bed", col.names = c("chr", "start", "end") ) gr.500 <- makeGRangesFromDataFrame(peaks.500) gr.1k <- makeGRangesFromDataFrame(peaks.1k) combined.peaks <- reduce(x = c(gr.500, gr.1k)) peakwidths <- width(combined.peaks) combined.peaks <- combined.peaks[peakwidths < 10000 & peakwidths > 20] combined.peaks md.500 <- read.table( file = "atac_pbmc_500_nextgem_singlecell.csv", stringsAsFactors = FALSE, sep = ",", header = TRUE, row.names = 1 )[-1, ] # remove the first row md.1k <- read.table( file = "atac_pbmc_1k_nextgem_singlecell.csv", stringsAsFactors = FALSE, sep = ",", header = TRUE, row.names = 1 )[-1, ] # remove the first row md.500 <- md.500[md.500$passed_filters > 500, ] md.1k <- md.1k[md.1k$passed_filters > 500, ] frags.500 <- CreateFragmentObject( path = "atac_pbmc_500_nextgem_fragments.tsv.gz", cells = rownames(md.500) ) frags.1k <- CreateFragmentObject( path = "atac_pbmc_1k_nextgem_fragments.tsv.gz", cells = rownames(md.1k) ) pbmc500.counts <- FeatureMatrix( fragments = frags.500, features = combined.peaks, cells = rownames(md.500) ) pbmc1k.counts <- FeatureMatrix( fragments = frags.1k, features = combined.peaks, cells = rownames(md.1k) ) pbmc500_assay <- CreateChromatinAssay(pbmc500.counts, fragments = frags.500) pbmc500 <- CreateSeuratObject(pbmc500_assay, assay = "ATAC", meta.data=md.500)

pbmc1k_assay <- CreateChromatinAssay(pbmc1k.counts, fragments = frags.1k) pbmc1k <- CreateSeuratObject(pbmc1k_assay, assay = "ATAC", meta.data=md.1k) pbmc1k

pbmc500$dataset <- 'pbmc500' pbmc1k$dataset <- 'pbmc1k' combined <- merge( x = pbmc500, y = list(pbmc1k), add.cell.ids = c("500", "1k") ) combined[["ATAC"]] combined

New bed file creation

bed <- read.table('/Users/masonsweat/Desktop/NGS_analysis/AF_Multiome/Regionmatrix/trial.bed', sep = '\t', header = FALSE) names(bed)[1] <- 'chromosome' names(bed)[2] <- 'start' names(bed)[3] <- 'end'

newrange <- makeGRangesFromDataFrame(bed, start.field = 'start', end.field = 'end', seqinfo = c('chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chrX'), seqnames.field = 'chromosome' ) DefaultAssay(pbmc500)<- 'ATAC' obj1 <-RegionMatrix( combined, newrange, key = 'myregions', assay = NULL, idents = NULL, upstream = 1000, downstream = 1000, verbose = TRUE, ) DefaultAssay(obj) <- 'ATAC' RegionHeatmap( obj1, key = 'myregions', assay = 'ATAC', idents = NULL, normalize = TRUE, upstream = 1000, downstream = 1000, max.cutoff = "q95", cols = NULL, min.counts = 0, window = (2000)/30, order = TRUE, nrow = NULL )

This returns the following plot:

image

It should be noted that the function works fine on either pbmc500 or pbmc1k, prior to merger. Here is an example regionheatmap for pbmc500:

obj1 <-RegionMatrix( pbmc500, newrange, key = 'myregions', assay = NULL, idents = NULL, upstream = 1000, downstream = 1000, verbose = TRUE, ) DefaultAssay(obj) <- 'ATAC' RegionHeatmap( obj1, key = 'myregions', assay = 'ATAC', idents = NULL, normalize = TRUE, upstream = 1000, downstream = 1000, max.cutoff = "q95", cols = NULL, min.counts = 0, window = (2000)/30, order = TRUE, nrow = NULL )

image

timoast commented 2 years ago

Thanks @masonsweat for looking into this! Turns out we weren't converting the cell names correctly when reading data from the fragment file, this should now be fixed on the develop branch.

masonsweat commented 2 years ago

It was my pleasure to look into it! I applied it to my data and it is working perfectly. Thank you so much for the quick fix Tim!

On Apr 18, 2022, at 15:37, Tim Stuart @.***> wrote:

Thanks @masonsweat https://github.com/masonsweat for looking into this! Turns out we weren't converting the cell names correctly when reading data from the fragment file, this should now be fixed on the develop branch.

— Reply to this email directly, view it on GitHub https://github.com/timoast/signac/issues/1076#issuecomment-1101697855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATF6L4LOI55CZWSNGMJZBOLVFW2XVANCNFSM5TIOLYYA. You are receiving this because you were mentioned.