stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
327 stars 88 forks source link

Incomplete TF footprinting #943

Closed Chhiring-Lama closed 2 years ago

Chhiring-Lama commented 2 years ago

Hi, I want to do TF footprinting centered on certain genes' TSS. So, I provided a list for peak regions where TSS are located to 'region' and their names to 'key' argument. However, out of the 17 regions, it only produces position enrichment for 2.

I could not regenerate the same issue with the atac_small object as fragment files were missing there for footprinting and I could not the positions for genes' TSS due as annotation for peaks in not present in the object.

Here is what I have so far:

library(readr)
library(Signac)
library(Seurat)
library(Patchwork)
library(GenomeInfoDb)
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)
library(tibble)

###finding the tss for our interested genes
### this just has a list our interested genes 
TSS_genes <- read_delim("selected genes for CEBPA&SPI1 common peaks.TF.txt", delim = "\t") 

TSS_gene_coord_1121 <- GetTSSPositions(ranges = Annotation(atac_small), biotypes = “protein_coding”)

TSSgene <- unlist(TSS_genes)

##TSS of 
interested_tss <- subset(TSS_gene_coord_1121, subset = gene_name %in% TSSgene)

##repeat footprinting for the tss region
atac_small <- Footprint(
  object = atac_small,
  regions = interested_tss,
  key = TSSgene,
  genome = BSgenome.Hsapiens.UCSC.hg38, 
  assay = "peaks",
  upstream = 500,
  downstream = 500
)
sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.62.0                   rtracklayer_1.54.0               
 [4] Biostrings_2.62.0                 XVector_0.34.0                    EnsDb.Hsapiens.v86_2.99.0        
 [7] ensembldb_2.18.2                  AnnotationFilter_1.18.0           GenomicFeatures_1.46.1           
[10] AnnotationDbi_1.56.2              Biobase_2.54.0                    GenomicRanges_1.46.1             
[13] GenomeInfoDb_1.30.0               IRanges_2.28.0                    S4Vectors_0.32.3                 
[16] BiocGenerics_0.40.0               SeuratObject_4.0.4                Seurat_4.0.5                     
[19] Signac_1.4.0                      forcats_0.5.1                     stringr_1.4.0                    
[22] dplyr_1.0.7                       purrr_0.3.4                       tidyr_1.1.4                      
[25] tibble_3.1.6                      ggplot2_3.3.5                     tidyverse_1.3.1                  
[28] readr_2.1.1                      

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  reticulate_1.22             tidyselect_1.1.1            RSQLite_2.2.9              
  [5] htmlwidgets_1.5.4           grid_4.1.1                  docopt_0.7.1                BiocParallel_1.28.2        
  [9] Rtsne_0.15                  munsell_0.5.0               codetools_0.2-18            ica_1.0-2                  
 [13] future_1.23.0               miniUI_0.1.1.1              withr_2.4.3                 colorspace_2.0-2           
 [17] filelock_1.0.2              rstudioapi_0.13             ROCR_1.0-11                 tensor_1.5                 
 [21] listenv_0.8.0               MatrixGenerics_1.6.0        slam_0.1-49                 GenomeInfoDbData_1.2.7     
 [25] polyclip_1.10-0             bit64_4.0.5                 farver_2.1.0                pheatmap_1.0.12            
 [29] parallelly_1.29.0           vctrs_0.3.8                 generics_0.1.1              BiocFileCache_2.2.0        
 [33] lsa_0.73.2                  ggseqlogo_0.1               R6_2.5.1                    DelayedArray_0.20.0        
 [37] bitops_1.0-7                spatstat.utils_2.2-0        cachem_1.0.6                assertthat_0.2.1           
 [41] vroom_1.5.7                 BiocIO_1.4.0                promises_1.2.0.1            scales_1.1.1               
 [45] gtable_0.3.0                globals_0.14.0              goftest_1.2-3               rlang_0.4.12               
 [49] RcppRoll_0.3.0              splines_4.1.1               lazyeval_0.2.2              spatstat.geom_2.3-0        
 [53] broom_0.7.10                yaml_2.2.1                  reshape2_1.4.4              abind_1.4-5                
 [57] modelr_0.1.8                backports_1.4.0             httpuv_1.6.3                tools_4.1.1                
 [61] ellipsis_0.3.2              spatstat.core_2.3-2         RColorBrewer_1.1-2          ggridges_0.5.3             
 [65] Rcpp_1.0.7                  plyr_1.8.6                  progress_1.2.2              zlibbioc_1.40.0            
 [69] RCurl_1.98-1.5              prettyunits_1.1.1           rpart_4.1-15                deldir_1.0-6               
 [73] pbapply_1.5-0               cowplot_1.1.1               zoo_1.8-9                   SummarizedExperiment_1.24.0
 [77] haven_2.4.3                 ggrepel_0.9.1               cluster_2.1.2               fs_1.5.1                   
 [81] magrittr_2.0.1.9000         data.table_1.14.2           scattermore_0.7             lmtest_0.9-39              
 [85] reprex_2.0.1                RANN_2.6.1                  SnowballC_0.7.0             ProtGenerics_1.26.0        
 [89] fitdistrplus_1.1-6          matrixStats_0.61.0          hms_1.1.1                   patchwork_1.1.1            
 [93] mime_0.12                   xtable_1.8-4                XML_3.99-0.8                sparsesvd_0.2              
 [97] readxl_1.3.1                gridExtra_2.3               compiler_4.1.1              biomaRt_2.50.1             
[101] KernSmooth_2.23-20          crayon_1.4.2                htmltools_0.5.2             mgcv_1.8-38                
[105] later_1.3.0                 tzdb_0.2.0                  lubridate_1.8.0             DBI_1.1.1                  
[109] tweenr_1.0.2                dbplyr_2.1.1                rappdirs_0.3.3              MASS_7.3-54                
[113] Matrix_1.3-4                cli_3.1.0                   parallel_4.1.1              igraph_1.2.9               
[117] pkgconfig_2.0.3             GenomicAlignments_1.30.0    plotly_4.10.0               spatstat.sparse_2.0-0      
[121] xml2_1.3.3                  rvest_1.0.2                 digest_0.6.29               sctransform_0.3.2          
[125] RcppAnnoy_0.0.19            spatstat.data_2.1-0         cellranger_1.1.0            leiden_0.3.9               
[129] fastmatch_1.1-3             uwot_0.1.11                 restfulr_0.0.13             curl_4.3.2                 
[133] shiny_1.7.1                 Rsamtools_2.10.0            rjson_0.2.20                lifecycle_1.0.1            
[137] nlme_3.1-153                jsonlite_1.7.2              viridisLite_0.4.0           fansi_0.5.0                
[141] pillar_1.6.4                lattice_0.20-45             KEGGREST_1.34.0             fastmap_1.1.0              
[145] httr_1.4.2                  survival_3.2-13             glue_1.5.1                  qlcMatrix_0.9.7            
[149] png_0.1-7                   bit_4.0.4                   ggforce_0.3.3               stringi_1.7.6              
[153] blob_1.2.2                  memoise_2.0.1               irlba_2.3.5                 future.apply_1.8.1  
timoast commented 2 years ago

I think there's a misunderstanding here if what the Footprint() function does. It will take a collection of regions (eg, instances of a TF motif), and count the frequency of Tn5 integration events surrounding all of those sites, not individually. So if you provided a list of 17 regions, it will produce one output aggregating information across all of those sites.

Chhiring-Lama commented 2 years ago

Thank you so much for clarifying! I am now doing the footprinting for each region individually as I want to see Tn5 integration events at each TSS and within 1000bp from the TSS. It is working for some. However, some some it does not work. What could be causing the error?

  |                                                  | 0 % ~calculating  Computing observed Tn5 insertions per base
Computing base composition at motif sites
Error in dimnamesGets(x, value) : 
  invalid dimnames given for "dgCMatrix" object
timoast commented 2 years ago

If there are no fragments in the region it may cause an error, the function wasn't really designed to look at a single locus