neurogenomics / MAGMA_Celltyping

Find causal cell-types underlying complex trait genetics
https://neurogenomics.github.io/MAGMA_Celltyping
71 stars 31 forks source link

conditional `calculate_celltype_associations`: `ERROR - reading gene covariate file: duplicate gene entry` #113

Closed bschilder closed 2 years ago

bschilder commented 2 years ago

1. Bug description

ERROR - reading gene covariate file: duplicate gene entry

Console output

Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Standardising CellTypeDataset
Found 5 matrix types across 2 CTD levels.
Processing level: 1
Converting to sparse matrix.
Processing level: 2
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Checking CTD: level 1
Checking CTD: level 2
Checking CTD: level 1
Checking CTD: level 2
Running MAGMA: Linear mode
Mapping gene symbols in specificity_quantiles matrix to entrez IDs.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T/Rtmpp6c79a/MAGMA_Files/fluid_intelligence.ukb.tsv.35UP.10DOWN/fluid_intelligence.ukb.tsv.35UP.10DOWN.genes.raw
    --gene-covar /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//Rtmpp6c79a/file103501c90b350
    --model direction=pos condition-residualize=ZSTAT1
    --out /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T/Rtmpp6c79a/MAGMA_Files/fluid_intelligence.ukb.tsv.35UP.10DOWN/fluid_intelligence.ukb.tsv.35UP.10DOWN.level1.ControllingForPropMemory

Start time is 16:27:51, Friday 29 Jul 2022

Reading file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T/Rtmpp6c79a/MAGMA_Files/fluid_intelligence.ukb.tsv.35UP.10DOWN/fluid_intelligence.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-level covariates...
Reading file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//Rtmpp6c79a/file103501c90b350... 
    detected 8 variables in file (using all)

ERROR - reading gene covariate file: duplicate gene entry on line 7 (ID = 100037417)
    line: 100037417 36  17  20  29  29  9   14  2.1586

Terminating program.
Warning: cannot open file '/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T/Rtmpp6c79a/MAGMA_Files/fluid_intelligence.ukb.tsv.35UP.10DOWN/fluid_intelligence.ukb.tsv.35UP.10DOWN.level1.ControllingForPropMemory.gsa.out': No such file or directoryError in file(file, "rt") : cannot open the connection

Expected behaviour

Returns enrichment results.

2. Reproducible example

From the full_workflow vignette:

Code

### CTD ###
ctd <- ewceData::ctd()

### GWAS 1 ###
path_formatted_intelligence <- MAGMA.Celltyping::get_example_gwas(
    trait = "fluid_intelligence", 
    munged = TRUE)
genesOutPath_intelligence <- MAGMA.Celltyping::map_snps_to_genes(
    path_formatted = path_formatted_intelligence,
    genome_build = "GRCh37")

### GWAS 2 ####
path_formatted_memory <- MAGMA.Celltyping::get_example_gwas(
    trait = "prospective_memory",
    munged = TRUE)
genesOutPath_memory <- MAGMA.Celltyping::map_snps_to_genes(
    path_formatted = path_formatted_memory,
    genome_build = "GRCH37")

### RUN ANALYSIS ###
ctAssocsLinear <- MAGMA.Celltyping::calculate_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_intelligence, 
  ctd_species = "mouse", 
  genesOutCOND = genesOutPath_memory,
  analysis_name = "ControllingForPropMemory")

3. Session info

``` R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.4 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MAGMA.Celltyping_2.0.6 ggplot2_3.3.6 dplyr_1.0.9 ewceData_1.4.0 [5] ExperimentHub_2.4.0 AnnotationHub_3.4.0 BiocFileCache_2.4.0 dbplyr_2.2.1 [9] BiocGenerics_0.42.0 loaded via a namespace (and not attached): [1] backports_1.4.1 googleAuthR_2.0.0 plyr_1.8.7 [4] lazyeval_0.2.2 splines_4.2.0 orthogene_1.3.1 [7] BiocParallel_1.30.3 GenomeInfoDb_1.32.2 digest_0.6.29 [10] yulab.utils_0.0.5 htmltools_0.5.2 RNOmni_1.0.0 [13] fansi_1.0.3 magrittr_2.0.3 memoise_2.0.1 [16] BSgenome_1.64.0 limma_3.52.2 Biostrings_2.64.0 [19] matrixStats_0.62.0 R.utils_2.12.0 prettyunits_1.1.1 [22] colorspace_2.0-3 blob_1.2.3 rappdirs_0.3.3 [25] gitcreds_0.1.1 xfun_0.31 crayon_1.5.1 [28] RCurl_1.98-1.7 jsonlite_1.8.0 lme4_1.1-30 [31] VariantAnnotation_1.42.1 ape_5.6-2 glue_1.6.2 [34] gargle_1.2.0 gtable_0.3.0 zlibbioc_1.42.0 [37] XVector_0.36.0 HGNChelper_0.8.1 DelayedArray_0.22.0 [40] car_3.1-0 SingleCellExperiment_1.18.0 abind_1.4-5 [43] scales_1.2.0 DBI_1.1.3 rstatix_0.7.0 [46] Rcpp_1.0.9 progress_1.2.2 viridisLite_0.4.0 [49] xtable_1.8-4 gridGraphics_0.5-1 tidytree_0.3.9 [52] bit_4.0.4 stats4_4.2.0 htmlwidgets_1.5.4 [55] httr_1.4.3 ellipsis_0.3.2 farver_2.1.1 [58] pkgconfig_2.0.3 XML_3.99-0.10 R.methodsS3_1.8.2 [61] utf8_1.2.2 labeling_0.4.2 ggplotify_0.1.0 [64] tidyselect_1.1.2 rlang_1.0.3 reshape2_1.4.4 [67] later_1.3.0 AnnotationDbi_1.58.0 munsell_0.5.0 [70] BiocVersion_3.15.2 tools_4.2.0 cachem_1.0.6 [73] cli_3.3.0 generics_0.1.3 RSQLite_2.2.14 [76] MungeSumstats_1.4.5 broom_1.0.0 evaluate_0.15 [79] ggdendro_0.1.23 stringr_1.4.0 fastmap_1.1.0 [82] yaml_2.3.5 ggtree_3.4.0 fs_1.5.2 [85] babelgene_22.3 knitr_1.39 bit64_4.0.5 [88] purrr_0.3.4 gh_1.3.0 KEGGREST_1.36.2 [91] gprofiler2_0.2.1 nlme_3.1-158 mime_0.12 [94] R.oo_1.25.0 aplot_0.1.6 xml2_1.3.3 [97] biomaRt_2.52.0 compiler_4.2.0 rstudioapi_0.13 [100] plotly_4.10.0 filelock_1.0.2 curl_4.3.2 [103] png_0.1-7 interactiveDisplayBase_1.34.0 ggsignif_0.6.3 [106] treeio_1.20.0 tibble_3.1.7 EWCE_1.5.3 [109] homologene_1.4.68.19.3.27 stringi_1.7.8 GenomicFeatures_1.48.3 [112] lattice_0.20-45 Matrix_1.4-1 nloptr_2.0.3 [115] vctrs_0.4.1 pillar_1.7.0 lifecycle_1.0.1 [118] BiocManager_1.30.18 data.table_1.14.2 bitops_1.0-7 [121] httpuv_1.6.5 patchwork_1.1.1 rtracklayer_1.56.1 [124] GenomicRanges_1.48.0 R6_2.5.1 BiocIO_1.6.0 [127] promises_1.2.0.1 gridExtra_2.3 IRanges_2.30.0 [130] codetools_0.2-18 boot_1.3-28 MASS_7.3-57 [133] assertthat_0.2.1 pkgload_1.3.0 SummarizedExperiment_1.26.1 [136] rjson_0.2.21 withr_2.5.0 GenomicAlignments_1.32.0 [139] Rsamtools_2.12.0 S4Vectors_0.34.0 GenomeInfoDbData_1.2.8 [142] hms_1.1.1 parallel_4.2.0 grid_4.2.0 [145] ggfun_0.0.6 minqa_1.2.4 tidyr_1.2.0 [148] rmarkdown_2.14 MatrixGenerics_1.8.1 carData_3.0-5 [151] ggpubr_0.4.0 piggyback_0.1.3 lubridate_1.8.0 [154] Biobase_2.56.0 shiny_1.7.1 restfulr_0.0.15 ```
bschilder commented 2 years ago

Manually importing the cov file does indeed show that there are 10 genes duplicated across rows:

cov <- data.table::fread("/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//Rtmpp6c79a/file103501c90b350")
sum(duplicated(cov$entrez))
# [1] 10

~~This might be a bug introduced in one of the more recent versions of MAGMA. Might need to correct this by first checking a rewriting covar files. It seems the only way to report bugs in MAGMA is to email the authors.~~

The covar file is actually written by MAGMA.Celltyping, meaning we can easily fix it! create_gene_covar_file

bschilder commented 2 years ago

Confirmed using the reprex above; removing duplicate genes from the covar file ensures everything runs smoothly.