stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
331 stars 88 forks source link

Quantifying regions with list of fragment files #1056

Closed jenellewallace closed 2 years ago

jenellewallace commented 2 years ago

I am trying to call peaks on a list of fragment files from different conditions (in the context of a multiome experiment), but peaks only seem to be called on the first file. Here is my code:

species = 'Human'
library(Seurat)
library(Signac)
set.seed(1)
source("/wynton/home/pollen/jwallace/R_analysis/MyFunctions.R")
source("/wynton/home/pollen/jwallace/S1_timepoint1/Seurat_analysis/MySeuratFunctions.R")

if (species == 'Human'){
  library(EnsDb.Hsapiens.v86)
  library(BSgenome.Hsapiens.UCSC.hg38)}

version = "V6/"
base_dir = strcat('/wynton/home/pollen/jwallace/S1_timepoint1/Signac_analysis/',species,'/')
save_dir = strcat('/wynton/home/pollen/jwallace/S1_timepoint1/Signac_analysis/',species,'/V6/')
fragment_dir = strcat(base_dir,"Fragment_files_v2_copy/")
setwd(base_dir)
figure_folder = strcat(base_dir,'Figures/')

if (species=='Human'){
  annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
  newlevels <- paste('chr', seqlevels(annotation), sep=""); newlevels[25] <- "chrM";seqlevels(annotation) <- newlevels
  seqlevelsStyle(annotation) <- 'UCSC'
  genome(annotation) <- "hg38"}

rna <- readRDS(file = "/wynton/home/pollen/jwallace/S1_timepoint1/Seurat_analysis/Human/V4/Human_rna.rds")

#Make fragment objects for all conditions
B_frags <- CreateFragmentObject(path = strcat(fragment_dir,"B_fragments.tsv.gz"))
S_frags <- CreateFragmentObject(path = strcat(fragment_dir,"S_fragments.tsv.gz"))
K1_frags <- CreateFragmentObject(path = strcat(fragment_dir,"K1_fragments.tsv.gz"))
K6_frags <- CreateFragmentObject(path = strcat(fragment_dir,"K6_fragments.tsv.gz"))
fragpath = dir(fragment_dir,"*gz$",full.names=TRUE)

#call peaks using macs2 on all cells
peaks_all <- CallPeaks(fragpath, macs2.path = "/wynton/home/pollen/jwallace/envs/env_signac/bin/macs2",fragment.tempdir = strcat(base_dir, 'macs2_temp/'))
peaks_all <- keepStandardChromosomes(peaks_all, pruning.mode = "coarse")
peaks_all <- subsetByOverlaps(x = peaks_all, ranges = blacklist_hg38_unified, invert = TRUE)
macs2_counts <- FeatureMatrix(fragments = list(B_frags,S_frags,K1_frags,K6_frags),features = peaks_all,cells = colnames(rna));
multi_prelim <- rna
common.cells <- intersect(colnames(multi_prelim), colnames(macs2_counts))
macs2_counts <- macs2_counts[, common.cells]
multi_prelim <- multi_prelim[, common.cells]
multi_prelim[["peaksprelim"]] <- CreateChromatinAssay(counts = macs2_counts,fragments = list(B_frags,S_frags,K1_frags,K6_frags), annotation = annotation,min.cells=-1,min.features = -1, cells = colnames(multi_prelim))

The individual files B_frags, S_frags, etc seem fine when I checked the contents:

head(B_frags,1)
chrom start   end              barcode readCount
1  chr1 10067 10333 B_TATGGCCCAAGATTCT-1         2
head(K6_frags,1)
chrom start   end               barcode readCount
1  chr1 10077 10198 K6_CGGGCTTAGGTGAAAT-1         2

The output of fragpath is:

[1] "/wynton/home/pollen/jwallace/S1_timepoint1/Signac_analysis/Human/Fragment_files_v2_copy//B_fragments.tsv.gz" 
[2] "/wynton/home/pollen/jwallace/S1_timepoint1/Signac_analysis/Human/Fragment_files_v2_copy//K1_fragments.tsv.gz"
[3] "/wynton/home/pollen/jwallace/S1_timepoint1/Signac_analysis/Human/Fragment_files_v2_copy//K6_fragments.tsv.gz"
[4] "/wynton/home/pollen/jwallace/S1_timepoint1/Signac_analysis/Human/Fragment_files_v2_copy//S_fragments.tsv.gz" 

So all the fragment files are indeed in the correct folder. You can see the problem by looking at the first and last lines of the object multi_prelim - the cells for the first fragment file (prefixed B) have data for peaks but the cells for the last file (prefixed K6) do not.

head(multi_prelim,1)
                    orig.ident nCount_RNA nFeature_RNA percent.mt integrated_snn_res.0.4 seurat_clusters celltype celltype_activity nCount_peaksprelim
B_AAACAGCCAAGGTGGC-1   baseline       4567         2197  0.3503394                      2            In2?       NA              NA_B                443
                     nFeature_peaksprelim
B_AAACAGCCAAGGTGGC-1                  431
tail(multi_prelim,1)
                     orig.ident nCount_RNA nFeature_RNA percent.mt integrated_snn_res.0.4 seurat_clusters celltype celltype_activity
K6_TTTGTTGGTTGGTTCT-1    kcl_6hr       7870         2702   1.041931                      7            In5?       NA             NA_K6
                      nCount_peaksprelim nFeature_peaksprelim
K6_TTTGTTGGTTGGTTCT-1                  0                    0

As a workaround, I can combine all the fragment files at the command line, but I am just wondering why this isn't working.

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /wynton/home/cbi/shared/software/CBI/R-4.1.0-gcc8/lib64/R/lib/libRblas.so
LAPACK: /wynton/home/cbi/shared/software/CBI/R-4.1.0-gcc8/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.62.0                   rtracklayer_1.54.0                Biostrings_2.62.0                
 [5] XVector_0.34.0                    EnsDb.Hsapiens.v86_2.99.0         ensembldb_2.18.2                  AnnotationFilter_1.18.0          
 [9] GenomicFeatures_1.46.1            AnnotationDbi_1.56.2              Biobase_2.54.0                    GenomicRanges_1.46.1             
[13] GenomeInfoDb_1.30.0               IRanges_2.28.0                    S4Vectors_0.32.3                  BiocGenerics_0.40.0              
[17] Signac_1.5.0                      SeuratObject_4.0.4                Seurat_4.0.5                     

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  reticulate_1.22             tidyselect_1.1.1            RSQLite_2.2.9               htmlwidgets_1.5.4          
  [6] grid_4.1.0                  docopt_0.7.1                BiocParallel_1.28.2         Rtsne_0.15                  munsell_0.5.0              
 [11] codetools_0.2-18            ica_1.0-2                   future_1.23.0               miniUI_0.1.1.1              withr_2.4.3                
 [16] colorspace_2.0-2            filelock_1.0.2              knitr_1.36                  rstudioapi_0.13             ROCR_1.0-11                
 [21] tensor_1.5                  listenv_0.8.0               labeling_0.4.2              MatrixGenerics_1.6.0        slam_0.1-49                
 [26] GenomeInfoDbData_1.2.7      polyclip_1.10-0             bit64_4.0.5                 farver_2.1.0                parallelly_1.29.0          
 [31] vctrs_0.3.8                 generics_0.1.1              xfun_0.28                   biovizBase_1.42.0           BiocFileCache_2.2.0        
 [36] lsa_0.73.2                  ggseqlogo_0.1               R6_2.5.1                    bitops_1.0-7                spatstat.utils_2.3-0       
 [41] cachem_1.0.6                DelayedArray_0.20.0         assertthat_0.2.1            promises_1.2.0.1            BiocIO_1.4.0               
 [46] scales_1.1.1                nnet_7.3-16                 gtable_0.3.0                globals_0.14.0              goftest_1.2-3              
 [51] rlang_0.4.12                RcppRoll_0.3.0              splines_4.1.0               lazyeval_0.2.2              dichromat_2.0-0            
 [56] checkmate_2.0.0             spatstat.geom_2.3-1         yaml_2.2.1                  reshape2_1.4.4              abind_1.4-5                
 [61] backports_1.4.0             httpuv_1.6.3                Hmisc_4.6-0                 tools_4.1.0                 ggplot2_3.3.5              
 [66] ellipsis_0.3.2              spatstat.core_2.3-2         RColorBrewer_1.1-2          ggridges_0.5.3              Rcpp_1.0.7                 
 [71] plyr_1.8.6                  base64enc_0.1-3             progress_1.2.2              zlibbioc_1.40.0             purrr_0.3.4                
 [76] RCurl_1.98-1.5              prettyunits_1.1.1           rpart_4.1-15                deldir_1.0-6                pbapply_1.5-0              
 [81] cowplot_1.1.1               zoo_1.8-9                   SummarizedExperiment_1.24.0 ggrepel_0.9.1               cluster_2.1.2              
 [86] magrittr_2.0.1              data.table_1.14.2           scattermore_0.7             lmtest_0.9-39               RANN_2.6.1                 
 [91] SnowballC_0.7.0             ProtGenerics_1.26.0         fitdistrplus_1.1-6          matrixStats_0.61.0          hms_1.1.1                  
 [96] patchwork_1.1.1             mime_0.12                   xtable_1.8-4                XML_3.99-0.8                jpeg_0.1-9                 
[101] sparsesvd_0.2               gridExtra_2.3               compiler_4.1.0              biomaRt_2.50.1              tibble_3.1.6               
[106] KernSmooth_2.23-20          crayon_1.4.2                htmltools_0.5.2             mgcv_1.8-38                 later_1.3.0                
[111] Formula_1.2-4               tidyr_1.1.4                 DBI_1.1.1                   tweenr_1.0.2                dbplyr_2.1.1               
[116] MASS_7.3-54                 rappdirs_0.3.3              Matrix_1.4-0                parallel_4.1.0              igraph_1.2.9               
[121] pkgconfig_2.0.3             GenomicAlignments_1.30.0    foreign_0.8-81              plotly_4.10.0               spatstat.sparse_2.0-0      
[126] xml2_1.3.3                  VariantAnnotation_1.40.0    stringr_1.4.0               digest_0.6.29               sctransform_0.3.2          
[131] RcppAnnoy_0.0.19            spatstat.data_2.1-0         leiden_0.3.9                fastmatch_1.1-3             htmlTable_2.3.0            
[136] uwot_0.1.11                 restfulr_0.0.13             curl_4.3.2                  shiny_1.7.1                 Rsamtools_2.10.0           
[141] rjson_0.2.20                lifecycle_1.0.1             nlme_3.1-153                jsonlite_1.7.2              viridisLite_0.4.0          
[146] fansi_0.5.0                 pillar_1.6.4                lattice_0.20-45             KEGGREST_1.34.0             fastmap_1.1.0              
[151] httr_1.4.2                  survival_3.2-13             glue_1.5.1                  qlcMatrix_0.9.7             png_0.1-7                  
[156] bit_4.0.4                   ggforce_0.3.3               stringi_1.7.6               blob_1.2.2                  latticeExtra_0.6-29        
[161] memoise_2.0.1               dplyr_1.0.7                 irlba_2.3.5                 future.apply_1.8.1         
timoast commented 2 years ago

Just to clarify, it looks like the peak calling worked but there are no counts quantified for some of the cells.

Can you confirm that cells in the RNA object colnames(rna) are present in the fragment files that you get no counts for? Eg:

"K6_TTTGTTGGTTGGTTCT-1" %in% colnames(rna)
jenellewallace commented 2 years ago

Thanks, yes I have just confirmed that the above statement returns TRUE, as does "K6_TTTGTTGGTTGGTTCT-1" %in% colnames(macs2_counts). So I think that peak calling worked and the problem is downstream. Thanks for your help!

timoast commented 2 years ago

Should be fixed now on the develop branch

jenellewallace commented 2 years ago

I have confirmed this is working now on my data - thanks so much for fixing the issue so quickly!