neurogenomics / MAGMA_Celltyping

Find causal cell-types underlying complex trait genetics
https://neurogenomics.github.io/MAGMA_Celltyping
71 stars 31 forks source link

`map_snps_to_genes`: Handle bgzipped files #144

Closed bschilder closed 1 year ago

bschilder commented 1 year ago

Currently throws an errors when GWAS sumstats are bgzip-compressed (e.g. tabix-indexed files).

eduAttainOkbayPth <- system.file("extdata", "eduAttainOkbay.txt",
                                 package = "MungeSumstats"
)
reformatted <- format_sumstats(
    path = eduAttainOkbayPth,
    ref_genome = "GRCh37",
    dbSNP = 144,bi_allelic_filter = TRUE,
    tabix_index = TRUE,
    log_folder_ind = TRUE,
    log_mungesumstats_msgs = TRUE,
)
magma_files <-  MAGMA.Celltyping::map_snps_to_genes(
    path_formatted = reformatted$sumstats,
    genome_build = "GRCH37",  
    population = "EUR",
    upstream_kb = 35,  
    downstream_kb = 10, 
    force_new = FALSE
)
******::NOTE::******
 - Formatted results will be saved to `tempdir()` by default.
 - This means all formatted summary stats will be deleted upon ending the R session.
 - To keep formatted summary stats, change `save_path`  ( e.g. `save_path=file.path('./formatted',basename(path))` ),   or make sure to copy files elsewhere after processing  ( e.g. `file.copy(save_path, './formatted/' )`.
 ******************** 

******::NOTE::******
 - Log results will be saved to `tempdir()` by default.
 - This means all log data from the run will be  deleted upon ending the R session.
 - To keep it, change `log_folder` to an actual directory  (e.g. log_folder='./').
 ******************** 

save_path suggests .gz output but tabix_index=TRUE Switching output to tabix-indexed format (.bgz).
Formatted summary statistics will be saved to ==>  /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/file826265f6aeca.tsv.bgz
Log data to be saved to ==>  /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw
Saving output messages to:
/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/MungeSumstats_log_msg.txt
Any runtime errors will be saved to:
/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/MungeSumstats_log_output.txt
Messages will not be printed to terminal.
Returning path to saved data.
Warning messages:
1: package ‘S4Vectors’ was built under R version 4.2.2 
2: package ‘GenomeInfoDb’ was built under R version 4.2.2 

===================== 🦠🌋🦠 Welcome to MAGMA.Celltyping 🦠🌋🦠 =====================
This package uses MAGMA:
https://ctg.cncr.nl/software/magma

To cite MAGMA.Celltyping, please use:
* Skene, N.G., Bryois, J., Bakken, T.E. et al. Genetic identification of
     brain cell types underlying schizophrenia. Nat Genet 50, 825-833 (2018).
     https://doi.org/10.1038/s41588-018-0129-5
* de Leeuw CA, Mooij JM, Heskes T, Posthuma D (2015) MAGMA: Generalized
     Gene-Set Analysis of GWAS Data. PLOS Computational Biology 11(4): e1004219.
     https://doi.org/10.1371/journal.pcbi.1004219

Please report any bugs or feature requests by filling out an Issues template:
     https://github.com/neurogenomics/MAGMA_Celltyping/issues
===================== 🦠🌋🦠 =========================== 🦠🌋🦠 =====================

Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Using existing genome_ref found in storage_dir.
Saving decompressed copy of path_formatted ==>  /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/file826265f6aeca.tsv
Error in strsplit(first_line, "\t")[[1]] : subscript out of bounds

session info

``` R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.2.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.2 [4] BiocGenerics_0.44.0 MungeSumstats_1.6.0 phenomix_0.99.4 loaded via a namespace (and not attached): [1] rappdirs_0.3.3 [2] rtracklayer_1.58.0 [3] scattermore_1.1 [4] R.methodsS3_1.8.2 [5] SeuratObject_4.1.3 [6] tidyr_1.3.0 [7] ggplot2_3.4.2 [8] clusterGeneration_1.3.7 [9] bit64_4.0.5 [10] irlba_2.3.5.1 [11] DelayedArray_0.24.0 [12] R.utils_2.12.2 [13] data.table_1.14.8 [14] KEGGREST_1.38.0 [15] RCurl_1.98-1.12 [16] doParallel_1.0.17 [17] generics_0.1.3 [18] GenomicFeatures_1.50.4 [19] RhpcBLASctl_0.23-42 [20] cowplot_1.1.1 [21] RSQLite_2.3.1 [22] RANN_2.6.1 [23] future_1.32.0 [24] bit_4.0.5 [25] spatstat.data_3.0-1 [26] webshot_0.5.4 [27] xml2_1.3.4 [28] httpuv_1.6.11 [29] SummarizedExperiment_1.28.0 [30] assertthat_0.2.1 [31] orthogene_1.5.3 [32] viridis_0.6.3 [33] gargle_1.4.0 [34] hms_1.1.3 [35] babelgene_22.9 [36] promises_1.2.0.1 [37] TSP_1.2-4 [38] fansi_1.0.4 [39] restfulr_0.0.15 [40] progress_1.2.2 [41] caTools_1.18.2 [42] dendextend_1.17.1 [43] dbplyr_2.3.2 [44] igraph_1.4.3 [45] DBI_1.1.3 [46] htmlwidgets_1.6.2 [47] sparsesvd_0.2-2 [48] spatstat.geom_3.2-1 [49] purrr_1.0.1 [50] ellipsis_0.3.2 [51] ggpubr_0.6.0 [52] dplyr_1.1.2 [53] backports_1.4.1 [54] gprofiler2_0.2.1 [55] aod_1.3.2 [56] biomaRt_2.54.1 [57] deldir_1.0-9 [58] MatrixGenerics_1.10.0 [59] SingleCellExperiment_1.20.1 [60] vctrs_0.6.2 [61] Biobase_2.58.0 [62] ROCR_1.0-11 [63] abind_1.4-5 [64] cachem_1.0.8 [65] grr_0.9.5 [66] BSgenome_1.66.3 [67] progressr_0.13.0 [68] sctransform_0.3.5 [69] treeio_1.23.1 [70] GenomicAlignments_1.34.1 [71] prettyunits_1.1.1 [72] goftest_1.2-3 [73] cluster_2.1.4 [74] ExperimentHub_2.6.0 [75] ape_5.7-1 [76] ontologyIndex_2.11 [77] lazyeval_0.2.2 [78] crayon_1.5.2 [79] spatstat.explore_3.2-1 [80] pkgconfig_2.0.3 [81] nlme_3.1-162 [82] pkgload_1.3.2 [83] seriation_1.4.2 [84] ewceData_1.7.1 [85] rlang_1.1.1 [86] globals_0.16.2 [87] lifecycle_1.0.3 [88] miniUI_0.1.1.1 [89] registry_0.5-1 [90] SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20 [91] filelock_1.0.2 [92] BiocFileCache_2.6.1 [93] AnnotationHub_3.6.0 [94] polyclip_1.10-4 [95] matrixStats_1.0.0 [96] lmtest_0.9-40 [97] aplot_0.1.10 [98] Matrix_1.5-4.1 [99] carData_3.0-5 [100] boot_1.3-28.1 [101] zoo_1.8-12 [102] ggridges_0.5.4 [103] png_0.1-8 [104] viridisLite_0.4.2 [105] rjson_0.2.21 [106] ca_0.71.1 [107] bitops_1.0-7 [108] R.oo_1.25.0 [109] KernSmooth_2.23-21 [110] Biostrings_2.66.0 [111] blob_1.2.4 [112] stringr_1.5.0 [113] parallelly_1.36.0 [114] spatstat.random_3.1-5 [115] gridGraphics_0.5-1 [116] rstatix_0.7.2 [117] remaCor_0.0.11 [118] MAGMA.Celltyping_2.0.10 [119] ggsignif_0.6.4 [120] BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1 [121] scales_1.2.1 [122] memoise_2.0.1 [123] magrittr_2.0.3 [124] plyr_1.8.8 [125] ica_1.0-3 [126] gplots_3.1.3 [127] zlibbioc_1.44.0 [128] compiler_4.2.1 [129] BiocIO_1.8.0 [130] RColorBrewer_1.1-3 [131] lme4_1.1-33 [132] fitdistrplus_1.1-11 [133] homologene_1.4.68.19.3.27 [134] Rsamtools_2.14.0 [135] cli_3.6.1 [136] XVector_0.38.0 [137] listenv_0.9.0 [138] patchwork_1.1.2 [139] pbapply_1.7-0 [140] MASS_7.3-60 [141] tidyselect_1.2.0 [142] stringi_1.7.12 [143] yaml_2.3.7 [144] ggrepel_0.9.3 [145] GeneOverlap_1.34.0 [146] grid_4.2.1 [147] VariantAnnotation_1.44.1 [148] tools_4.2.1 [149] future.apply_1.11.0 [150] parallel_4.2.1 [151] rstudioapi_0.14 [152] RNOmni_1.0.1 [153] foreach_1.5.2 [154] piggyback_0.1.4 [155] gridExtra_2.3 [156] Rtsne_0.16 [157] HGNChelper_0.8.1 [158] BiocManager_1.30.20 [159] digest_0.6.31 [160] shiny_1.7.4 [161] Rcpp_1.0.10 [162] car_3.1-2 [163] GenomicRanges_1.50.2 [164] broom_1.0.4 [165] BiocVersion_3.16.0 [166] later_1.3.1 [167] RcppAnnoy_0.0.20 [168] ggdendro_0.1.23 [169] httr_1.4.6 [170] AnnotationDbi_1.60.2 [171] Rdpack_2.4 [172] colorspace_2.1-0 [173] XML_3.99-0.14 [174] fs_1.6.2 [175] tensor_1.5 [176] reticulate_1.28 [177] splines_4.2.1 [178] yulab.utils_0.0.6 [179] uwot_0.1.14 [180] tidytree_0.4.2 [181] spatstat.utils_3.0-3 [182] gh_1.4.0 [183] sp_1.6-1 [184] ggplotify_0.1.0 [185] plotly_4.10.2 [186] xtable_1.8-4 [187] ggtree_3.6.2 [188] jsonlite_1.8.4 [189] nloptr_2.0.3 [190] heatmaply_1.4.2 [191] ggfun_0.0.9 [192] R6_2.5.1 [193] RUnit_0.4.32 [194] EWCE_1.9.0 [195] pillar_1.9.0 [196] htmltools_0.5.5 [197] mime_0.12 [198] glue_1.6.2 [199] fastmap_1.1.1 [200] minqa_1.2.5 [201] BiocParallel_1.32.6 [202] interactiveDisplayBase_1.36.0 [203] codetools_0.2-19 [204] mvtnorm_1.2-1 [205] utf8_1.2.3 [206] lattice_0.21-8 [207] spatstat.sparse_3.0-1 [208] tibble_3.2.1 [209] pbkrtest_0.5.2 [210] curl_5.0.0 [211] leiden_0.4.3 [212] gtools_3.9.4 [213] survival_3.5-5 [214] limma_3.54.2 [215] googleAuthR_2.0.1 [216] munsell_0.5.0 [217] GenomeInfoDbData_1.2.9 [218] iterators_1.0.14 [219] variancePartition_1.28.9 [220] reshape2_1.4.4 [221] gtable_0.3.3 [222] rbibutils_2.2.13 [223] Seurat_4.3.0 ```
bschilder commented 1 year ago

Turns out this was already implemented but wasn't working due to a bug that only considered files named ".gz" and not those with the ".bgz" suffix.

Fixed now.