neurogenomics / MAGMA_Celltyping

Find causal cell-types underlying complex trait genetics
https://neurogenomics.github.io/MAGMA_Celltyping
72 stars 31 forks source link

*gsa.out: truncated `VARIABLE` names #141

Closed bschilder closed 1 year ago

bschilder commented 1 year ago

1. Bug description

For some reason, MAGMA decided to truncate the VARIABLE column in their gsa.out files. This screws up attempts to match up the celltype names with those in the CTD.

Not sure if this was always the case, or I'm just noticing now because I'm using some CTDs with long celltype names (e.g. HumanCellLandscape)

Console output

Screenshot 2023-04-13 at 12 53 53

Expected behaviour

MAGMA.Celltyping can read in the results files without hitting an error.

2. Reproducible example

Code

magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = c("ieu-a-298"))

ctd <- MAGMA::get_ctd("ctd_HumanCellLandscape")
 res <- MAGMA.Celltyping::celltype_associations_pipeline(
    ctd = ctd, 
    ctd_name ="ctd_HumanCellLandscape", 
    ctd_species = "human",
    magma_dirs = magma_dirs,  
    run_linear = TRUE, 
    run_top10 = TRUE, 
    upstream_kb = 35, 
    downstream_kb = 10, 
    force_new = TRUE,
    save_dir=here::here("processed_data/MAGMA")) 

3. Session info

``` R Under development (unstable) (2023-03-02 r83926) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: Etc/UTC tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] fs_1.6.1 matrixStats_0.63.0 bitops_1.0-7 lubridate_1.9.2 devtools_2.4.5 [6] webshot_0.5.4 RColorBrewer_1.1-3 httr_1.4.5 doParallel_1.0.17 dynamicTreeCut_1.63-1 [11] gh_1.4.0 numDeriv_2016.8-1.1 profvis_0.3.7 tools_4.3.0 MAGMA.Celltyping_2.0.9 [16] backports_1.4.1 utf8_1.2.3 R6_2.5.1 uwot_0.1.14 lazyeval_0.2.2 [21] withr_2.5.0 urlchecker_1.0.1 gridExtra_2.3 prettyunits_1.1.1 preprocessCore_1.61.0 [26] WGCNA_1.72-1 cli_3.6.1 Biobase_2.59.0 TSP_1.2-4 askpass_1.1 [31] ewceData_1.7.1 Rsamtools_2.15.3 yulab.utils_0.0.6 foreign_0.8-84 R.utils_2.12.2 [36] sessioninfo_1.2.2 plotrix_3.8-2 BSgenome_1.67.4 orthogene_1.5.3 maps_3.4.1 [41] limma_3.55.7 readxl_1.4.2 impute_1.73.0 rstudioapi_0.14 RSQLite_2.3.1 [46] optimParallel_1.0-2 generics_0.1.3 gridGraphics_0.5-1 BiocIO_1.9.2 combinat_0.0-8 [51] dendextend_1.17.1 car_3.1-2 dplyr_1.1.1 homologene_1.4.68.19.3.27 GO.db_3.17.0 [56] Matrix_1.5-3 fansi_1.0.4 S4Vectors_0.37.5 abind_1.4-5 R.methodsS3_1.8.2 [61] lifecycle_1.0.3 scatterplot3d_0.3-43 yaml_2.3.7 carData_3.0-5 SummarizedExperiment_1.29.1 [66] clusterGeneration_1.3.7 BiocFileCache_2.7.2 grid_4.3.0 blob_1.2.4 promises_1.2.0.1 [71] ExperimentHub_2.7.1 crayon_1.5.2 miniUI_0.1.1.1 lattice_0.20-45 GenomicFeatures_1.51.4 [76] KEGGREST_1.39.0 MungeSumstats_1.7.19 pillar_1.9.0 knitr_1.42 GenomicRanges_1.51.4 [81] rjson_0.2.21 boot_1.3-28.1 codetools_0.2-19 fastmatch_1.1-3 glue_1.6.2 [86] ggfun_0.0.9 data.table_1.14.8 remotes_2.4.2 vctrs_0.6.1 png_0.1-8 [91] treeio_1.23.1 cellranger_1.1.0 gtable_0.3.3 assertthat_0.2.1 cachem_1.0.7 [96] xfun_0.38 mime_0.12 coda_0.19-4 survival_3.5-3 gargle_1.3.0 [101] seriation_1.4.2 SingleCellExperiment_1.21.1 RNOmni_1.0.1 iterators_1.0.14 interactiveDisplayBase_1.37.0 [106] ellipsis_0.3.2 nlme_3.1-162 ggtree_3.7.2 EWCE_1.7.4 usethis_2.1.6 [111] bit64_4.0.5 progress_1.2.2 filelock_1.0.2 googleAuthR_2.0.1 GenomeInfoDb_1.35.16 [116] rprojroot_2.0.3 rpart_4.1.19 Hmisc_5.0-1 colorspace_2.1-0 BiocGenerics_0.45.3 [121] DBI_1.1.3 nnet_7.3-18 phangorn_2.11.1 mnormt_2.1.1 tidyselect_1.2.0 [126] processx_3.8.0 bit_4.0.5 compiler_4.3.0 curl_5.0.0 httr2_0.2.2 [131] htmlTable_2.4.1 expm_0.999-7 xml2_1.3.3 ggdendro_0.1.23 DelayedArray_0.25.0 [136] plotly_4.10.1 rtracklayer_1.59.1 checkmate_2.1.0 scales_1.2.1 quadprog_1.5-8 [141] callr_3.7.3 rappdirs_0.3.3 stringr_1.5.0 digest_0.6.31 piggyback_0.1.4 [146] minqa_1.2.5 rmarkdown_2.21 ca_0.71.1 XVector_0.39.0 base64enc_0.1-3 [151] htmltools_0.5.5 pkgconfig_2.0.3 lme4_1.1-32 MatrixGenerics_1.11.1 dbplyr_2.3.2 [156] fastmap_1.1.1 rlang_1.1.0 htmlwidgets_1.6.2 shiny_1.7.4 jsonlite_1.8.4 [161] BiocParallel_1.33.12 R.oo_1.25.0 VariantAnnotation_1.45.1 RCurl_1.98-1.12 magrittr_2.0.3 [166] Formula_1.2-5 GenomeInfoDbData_1.2.10 ggplotify_0.1.0 patchwork_1.1.2 munsell_0.5.0 [171] Rcpp_1.0.10 viridis_0.6.2 ape_5.7-1 babelgene_22.9 stringi_1.7.12 [176] zlibbioc_1.45.0 MASS_7.3-58.3 AnnotationHub_3.7.4 plyr_1.8.8 pkgbuild_1.4.0 [181] parallel_4.3.0 Biostrings_2.67.2 splines_4.3.0 hms_1.1.3 ps_1.7.4 [186] fastcluster_1.2.3 igraph_1.4.2 ggpubr_0.6.0 ggsignif_0.6.4 reshape2_1.4.4 [191] biomaRt_2.55.4 stats4_4.3.0 pkgload_1.3.2 gprofiler2_0.2.1 BiocVersion_3.17.1 [196] XML_3.99-0.14 evaluate_0.20 BiocManager_1.30.20 nloptr_2.0.3 foreach_1.5.2 [201] httpuv_1.6.9 openssl_2.0.6 grr_0.9.5 tidyr_1.3.0 purrr_1.0.1 [206] heatmaply_1.4.2 ggplot2_3.4.2 broom_1.0.4 xtable_1.8-4 restfulr_0.0.15 [211] gitcreds_0.1.2 phytools_1.5-1 tidytree_0.4.2 rstatix_0.7.2 later_1.3.0 [216] viridisLite_0.4.1 googledrive_2.1.0 tibble_3.2.1 aplot_0.1.10 registry_0.5-1 [221] memoise_2.0.1 AnnotationDbi_1.61.2 GenomicAlignments_1.35.1 IRanges_2.33.1 cluster_2.1.4 [226] HGNChelper_0.8.1 timechange_0.2.0 here_1.0.1 ```
bschilder commented 1 year ago

Looks like I actually already accounted for this in the past: https://github.com/neurogenomics/MAGMA_Celltyping/blob/349eeabcc29735fcc4b2fc4cedb4f22580f8ddba/R/load_magma_results_file.r#L70

Bizarrely, the FULL_NAME column isn't even consistently generated. So I had to make this conditional: Screenshot 2023-04-14 at 12 14 39