stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
316 stars 85 forks source link

Adding information to a Chromatin Assay #765

Closed cristanchoa closed 3 years ago

cristanchoa commented 3 years ago

Hi,

I had merged 16 different runs together in a multiome and for the ATAC peaks somewhere along the way it looks like I lost the genome information:

all_runs@assays[["peaks"]] ChromatinAssay data with 221328 features for 141659 cells Variable features: 221328 Genome: Annotation present: TRUE Motifs present: FALSE Fragment files: 16

How can I go about and add this back? I am using mm10.

Thanks, Ana

timoast commented 3 years ago

genome(all_runs) <- "mm10"

See https://satijalab.org/signac/articles/data_structures.html

cristanchoa commented 3 years ago

Thank you for the fast response. That is not quite working. I get the following:

> genome(all_runs)<-"mm10" Error in .order_seqlevels(chrom_sizes[, "chrom"]) : !anyNA(m31) is not TRUE Called from: .order_seqlevels(chrom_sizes[, "chrom"])

timoast commented 3 years ago

This is due to a bug in GenomeInfoDb that's fixed in the latest version, see https://github.com/timoast/signac/issues/687

Note that the genome information doesn't really need to be set, you can run all downstream functions in Signac without it

cristanchoa commented 3 years ago

I see.

Have their been other recent updates that may affect downstream analysis people have reported? I had generated this object a few months ago and ran differentially accessibility by cell type and obtained hundreds of different sites. When I ran the same code again recently to start to refine the strategy I get 10 sites with differential accessibility total in the same cell type with the original code. I regenerated the entire object again from the same files in case something was corrupted but again the same result of only 10 sites now.

simple code: Glut1.da<-FindMarkers(all_runs, ident.1 = "Glut1_hypoxia", ident.2= "Glut1_normoxia")

The RNA results are not altered at all, but the analysis in "peaks"

Thanks for the help, Ana

Below is the session info from now but unfortunately I didn't realize I should have kept it from the original analysis.

R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.2 (Ootpa)

Matrix products: default BLAS: /cm/shared/apps_chop/R/4.0.5/lib64/R/lib/libRblas.so LAPACK: /cm/shared/apps_chop/R/4.0.5/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] harmony_1.0 Rcpp_1.0.7 forcats_0.5.1 stringr_1.4.0
[5] dplyr_1.0.7 purrr_0.3.4 readr_2.0.1 tidyr_1.1.3
[9] tibble_3.1.3 tidyverse_1.3.1 EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.14.1
[13] AnnotationFilter_1.14.0 GenomicFeatures_1.42.3 AnnotationDbi_1.52.0 Biobase_2.50.0
[17] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[21] BiocGenerics_0.36.1 patchwork_1.1.1 ggplot2_3.3.5 SeuratWrappers_0.3.0
[25] SeuratObject_4.0.2 Seurat_4.0.4 Signac_1.3.0

loaded via a namespace (and not attached): [1] utf8_1.2.2 reticulate_1.20 tidyselect_1.1.1 RSQLite_2.2.7
[5] htmlwidgets_1.5.3 docopt_0.7.1 BiocParallel_1.24.1 Rtsne_0.15
[9] munsell_0.5.0 codetools_0.2-18 ica_1.0-2 future_1.21.0
[13] miniUI_0.1.1.1 withr_2.4.2 colorspace_2.0-2 rstudioapi_0.13
[17] ROCR_1.0-11 tensor_1.5 listenv_0.8.0 labeling_0.4.2
[21] MatrixGenerics_1.2.1 slam_0.1-48 GenomeInfoDbData_1.2.4 polyclip_1.10-0
[25] bit64_4.0.5 farver_2.1.0 parallelly_1.27.0 vctrs_0.3.8
[29] generics_0.1.0 xfun_0.25 BiocFileCache_1.14.0 lsa_0.73.2
[33] ggseqlogo_0.1 R6_2.5.1 rsvd_1.0.5 bitops_1.0-7
[37] spatstat.utils_2.2-0 cachem_1.0.6 DelayedArray_0.16.3 assertthat_0.2.1
[41] vroom_1.5.4 promises_1.2.0.1 scales_1.1.1 gtable_0.3.0
[45] globals_0.14.0 goftest_1.2-2 rlang_0.4.11 RcppRoll_0.3.0
[49] splines_4.0.5 rtracklayer_1.50.0 lazyeval_0.2.2 spatstat.geom_2.2-2
[53] broom_0.7.9 modelr_0.1.8 BiocManager_1.30.15 reshape2_1.4.4
[57] abind_1.4-5 backports_1.2.1 httpuv_1.6.2 tools_4.0.5
[61] ellipsis_0.3.2 spatstat.core_2.3-0 RColorBrewer_1.1-2 ggridges_0.5.3
[65] plyr_1.8.6 progress_1.2.2 zlibbioc_1.36.0 RCurl_1.98-1.4
[69] prettyunits_1.1.1 rpart_4.1-15 openssl_1.4.4 deldir_0.2-10
[73] pbapply_1.4-3 cowplot_1.1.1 zoo_1.8-9 haven_2.4.3
[77] SummarizedExperiment_1.20.0 ggrepel_0.9.1 cluster_2.1.1 fs_1.5.0
[81] tinytex_0.33 magrittr_2.0.1 data.table_1.14.0 scattermore_0.7
[85] reprex_2.0.1 lmtest_0.9-38 RANN_2.6.1 SnowballC_0.7.0
[89] ProtGenerics_1.22.0 fitdistrplus_1.1-5 matrixStats_0.60.0 hms_1.1.0
[93] mime_0.11 xtable_1.8-4 XML_3.99-0.7 readxl_1.3.1
[97] sparsesvd_0.2 gridExtra_2.3 compiler_4.0.5 biomaRt_2.46.3
[101] KernSmooth_2.23-18 crayon_1.4.1 htmltools_0.5.1.1 tzdb_0.1.2
[105] mgcv_1.8-34 later_1.3.0 lubridate_1.7.10 DBI_1.1.1
[109] tweenr_1.0.2 dbplyr_2.1.1 MASS_7.3-53.1 rappdirs_0.3.3
[113] Matrix_1.3-4 cli_3.0.1 igraph_1.2.6 pkgconfig_2.0.3
[117] GenomicAlignments_1.26.0 plotly_4.9.4.1 spatstat.sparse_2.0-0 xml2_1.3.2
[121] XVector_0.30.0 rvest_1.0.1 digest_0.6.27 sctransform_0.3.2
[125] RcppAnnoy_0.0.19 spatstat.data_2.1-0 Biostrings_2.58.0 cellranger_1.1.0
[129] leiden_0.3.9 fastmatch_1.1-3 uwot_0.1.10 curl_4.3.2
[133] shiny_1.6.0 Rsamtools_2.6.0 lifecycle_1.0.0 nlme_3.1-152
[137] jsonlite_1.7.2 viridisLite_0.4.0 askpass_1.1 fansi_0.5.0
[141] pillar_1.6.2 lattice_0.20-41 fastmap_1.1.0 httr_1.4.2
[145] survival_3.2-10 glue_1.4.2 remotes_2.4.0 qlcMatrix_0.9.7
[149] png_0.1-7 bit_4.0.4 ggforce_0.3.3 stringi_1.7.3
[153] blob_1.2.2 memoise_2.0.0 irlba_2.3.3 future.apply_1.8.1

timoast commented 3 years ago

You can find the changelog for Signac here: https://satijalab.org/signac/news/index.html

The calculation of fold changes was fixed in 1.3.0, which may explain different Seurat::FindMarkers results. You may need to change the default settings for min.pct and logfc.threshold in FindMarkers() to identify peaks with smaller fold-changes or for very sparse data