quadbio / Pando

Multiome GRN inference.
https://quadbio.github.io/Pando/
MIT License
106 stars 21 forks source link

Extract significant modules #65

Closed AmelZulji closed 3 weeks ago

AmelZulji commented 4 weeks ago

How to extract modules as reported by modules object generated by find_modules():

suppressPackageStartupMessages({
  library(Signac)
  library(Seurat)
  library(Pando)
  library(BSgenome.Hsapiens.UCSC.hg38)
  library(tidyverse)
})
#> Warning: package 'GenomeInfoDb' was built under R version 4.2.2
#> Warning: package 'S4Vectors' was built under R version 4.2.2
#> Warning: package 'BSgenome' was built under R version 4.2.2
#> Warning: package 'GenomicRanges' was built under R version 4.2.2

pando <- readRDS(here::here("test/pando_demo/pando.rds"))

pando <-
  find_modules(
    pando,
    p_thresh = 0.05,
    min_genes_per_module = 5,
    rsq_thresh = 0.1,
    nvar_thresh = 10
  )
#> Found 609 TF modules

modules <- NetworkModules(pando)

Here the object shows that there are 116 TF modules

print(modules)
#> An Modules object with 116 TF modules

However in metadata the number of modules is different:

modules@meta$tf |> unique() |> length()
#> [1] 609

It is not matched after filtering by number of genes per module

modules@meta |> filter(n_genes > 5 & padj < 0.05) |> pull(tf) |> unique() |> length()
#> [1] 158

I tried to manually extract modules as in the find_modules() function but cant get exact same modules as reported by module object.

Can you please help? Regards, Amel

Created on 2024-06-11 with reprex v2.1.0

Session info ``` r sessionInfo() #> R version 4.2.1 (2022-06-23) #> Platform: x86_64-apple-darwin17.0 (64-bit) #> Running under: macOS Big Sur ... 10.16 #> #> Matrix products: default #> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib #> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib #> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 #> #> attached base packages: #> [1] stats4 stats graphics grDevices utils datasets methods #> [8] base #> #> other attached packages: #> [1] lubridate_1.9.3 forcats_1.0.0 #> [3] stringr_1.5.1 dplyr_1.1.4 #> [5] purrr_1.0.2 readr_2.1.5 #> [7] tidyr_1.3.1 tibble_3.2.1 #> [9] ggplot2_3.5.1 tidyverse_2.0.0 #> [11] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3 #> [13] rtracklayer_1.58.0 Biostrings_2.66.0 #> [15] XVector_0.38.0 GenomicRanges_1.50.2 #> [17] GenomeInfoDb_1.34.9 IRanges_2.32.0 #> [19] S4Vectors_0.36.2 BiocGenerics_0.44.0 #> [21] Pando_1.0.4 SeuratObject_4.1.3 #> [23] Seurat_4.3.0 Signac_1.11.0 #> #> loaded via a namespace (and not attached): #> [1] utf8_1.2.4 spatstat.explore_3.2-7 #> [3] reticulate_1.36.1 R.utils_2.12.3 #> [5] tidyselect_1.2.1 htmlwidgets_1.6.4 #> [7] grid_4.2.1 BiocParallel_1.32.6 #> [9] Rtsne_0.17 munsell_0.5.1 #> [11] codetools_0.2-19 ica_1.0-3 #> [13] future_1.33.2 miniUI_0.1.1.1 #> [15] withr_3.0.0 spatstat.random_3.2-3 #> [17] colorspace_2.1-0 progressr_0.14.0 #> [19] Biobase_2.58.0 knitr_1.46 #> [21] rstudioapi_0.16.0 ROCR_1.0-11 #> [23] tensor_1.5 listenv_0.9.1 #> [25] MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9 #> [27] polyclip_1.10-6 farver_2.1.1 #> [29] rprojroot_2.0.4 parallelly_1.37.1 #> [31] vctrs_0.6.5 generics_0.1.3 #> [33] xfun_0.43 timechange_0.3.0 #> [35] R6_2.5.1 graphlayouts_1.1.1 #> [37] pals_1.8 bitops_1.0-7 #> [39] spatstat.utils_3.0-4 cachem_1.0.8 #> [41] DelayedArray_0.24.0 promises_1.3.0 #> [43] BiocIO_1.8.0 scales_1.3.0 #> [45] ggraph_2.2.1 gtable_0.3.5 #> [47] globals_0.16.3 goftest_1.2-3 #> [49] tidygraph_1.3.1 rlang_1.1.3 #> [51] RcppRoll_0.3.0 splines_4.2.1 #> [53] lazyeval_0.2.2 dichromat_2.0-0.1 #> [55] spatstat.geom_3.2-9 yaml_2.3.8 #> [57] reshape2_1.4.4 abind_1.4-5 #> [59] httpuv_1.6.15 tools_4.2.1 #> [61] RColorBrewer_1.1-3 ggridges_0.5.6 #> [63] Rcpp_1.0.12 plyr_1.8.9 #> [65] sparseMatrixStats_1.10.0 zlibbioc_1.44.0 #> [67] RCurl_1.98-1.14 deldir_2.0-4 #> [69] pbapply_1.7-2 viridis_0.6.5 #> [71] cowplot_1.1.3 zoo_1.8-12 #> [73] SummarizedExperiment_1.28.0 ggrepel_0.9.5 #> [75] cluster_2.1.4 here_1.0.1 #> [77] fs_1.6.3 magrittr_2.0.3 #> [79] data.table_1.15.4 scattermore_1.2 #> [81] lmtest_0.9-40 reprex_2.1.0 #> [83] RANN_2.6.1 fitdistrplus_1.1-11 #> [85] R.cache_0.16.0 matrixStats_1.1.0 #> [87] hms_1.1.3 patchwork_1.2.0 #> [89] mime_0.12 evaluate_0.23 #> [91] xtable_1.8-4 XML_3.99-0.16.1 #> [93] gridExtra_2.3 compiler_4.2.1 #> [95] maps_3.4.2 KernSmooth_2.23-20 #> [97] crayon_1.5.2 ggpointdensity_0.1.0 #> [99] R.oo_1.26.0 htmltools_0.5.8.1 #> [101] later_1.3.2 tzdb_0.4.0 #> [103] tweenr_2.0.3 MASS_7.3-58.2 #> [105] Matrix_1.6-3 cli_3.6.2 #> [107] R.methodsS3_1.8.2 parallel_4.2.1 #> [109] igraph_2.0.3 pkgconfig_2.0.3 #> [111] GenomicAlignments_1.34.1 sp_2.1-3 #> [113] plotly_4.10.4 spatstat.sparse_3.0-3 #> [115] digest_0.6.35 sctransform_0.4.1 #> [117] RcppAnnoy_0.0.22 spatstat.data_3.0-4 #> [119] rmarkdown_2.26 leiden_0.4.3.1 #> [121] fastmatch_1.1-4 uwot_0.2.2 #> [123] restfulr_0.0.15 shiny_1.8.1.1 #> [125] Rsamtools_2.14.0 rjson_0.2.21 #> [127] lifecycle_1.0.4 nlme_3.1-162 #> [129] jsonlite_1.8.8 mapproj_1.2.11 #> [131] viridisLite_0.4.2 fansi_1.0.6 #> [133] pillar_1.9.0 lattice_0.20-45 #> [135] fastmap_1.1.1 httr_1.4.7 #> [137] survival_3.5-3 glue_1.7.0 #> [139] png_0.1-8 ggforce_0.4.2 #> [141] stringi_1.8.3 memoise_2.0.1 #> [143] styler_1.10.3 irlba_2.3.5.1 #> [145] future.apply_1.11.2 ```
AmelZulji commented 3 weeks ago

I figured out that the modules reported by NetworkModules(pando) are computed with

    models_use <- gof(object) %>%
        filter(rsq>rsq_thresh & nvariables>nvar_thresh) %>%
        pull(target) %>%
        unique()

And then further subset by

    module_pos <- modules %>%
        filter(estimate>0) %>%
        group_by(tf) %>% filter(n()>min_genes_per_module) %>%
        group_split() %>% {names(.) <- map_chr(., function(x) x$tf[[1]]); .} %>%
        map(function(x) x$target)

    module_neg <- modules %>%
        filter(estimate<0) %>%
        group_by(tf) %>% filter(n()>min_genes_per_module) %>%
        group_split() %>% {names(.) <- map_chr(., function(x) x$tf[[1]]); .} %>%
        map(function(x) x$target)

modules reported by print(modules) are to be found in:

modules@features[["genes_pos"]]
modules@features[["genes_neg"]]