neurogenomics / MAGMA_Celltyping

Find causal cell-types underlying complex trait genetics
https://neurogenomics.github.io/MAGMA_Celltyping
71 stars 31 forks source link

ERROR - reading gene covariate file: duplicate covariate variable name #142

Closed bschilder closed 1 year ago

bschilder commented 1 year ago

1. Bug description

It seems some of my CTDs still have duplicate celltype names. Thought my standardisation pipeline accounted for this, but apparently not:

Console output

Screenshot 2023-04-14 at 12 25 59

Expected behaviour

Maybe add some handling of this scenario (make celltypes unique) wthin MAGMA.Celltyping

2. Reproducible example

Seems to be occurring in :

Code

magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = c("ieu-a-298"))
ctd <- MAGMA.Celltyping::get_ctd("ctd_Jiang2021")
res <- MAGMA.Celltyping::celltype_associations_pipeline(
    ctd = ctd,
    ctd_levels = 1,
    ctd_name = "ctd_Jiang2021", 
    magma_dirs = magma_dirs)

3. Session info

``` R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.2.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] utf8_1.2.3 devoptera_0.99.0 R.utils_2.12.2 [4] RUnit_0.4.32 tidyselect_1.2.0 lme4_1.1-32 [7] RSQLite_2.3.1 AnnotationDbi_1.60.2 htmlwidgets_1.6.2 [10] grid_4.2.1 BiocParallel_1.32.6 devtools_2.4.5 [13] munsell_0.5.0 codetools_0.2-19 miniUI_0.1.1.1 [16] colorspace_2.1-0 Biobase_2.58.0 filelock_1.0.2 [19] knitr_1.42 rstudioapi_0.14 orthogene_1.5.3 [22] stats4_4.2.1 SingleCellExperiment_1.20.1 ggsignif_0.6.4 [25] gitcreds_0.1.2 MatrixGenerics_1.10.0 httr2_0.2.2 [28] GenomeInfoDbData_1.2.9 bit64_4.0.5 rprojroot_2.0.3 [31] vctrs_0.6.1 treeio_1.23.1 generics_0.1.3 [34] xfun_0.38 timechange_0.2.0 BiocFileCache_2.6.1 [37] R6_2.5.1 GenomeInfoDb_1.34.9 bitops_1.0-7 [40] cachem_1.0.7 gridGraphics_0.5-1 DelayedArray_0.24.0 [43] assertthat_0.2.1 promises_1.2.0.1 BiocIO_1.8.0 [46] scales_1.2.1 gtable_0.3.3 biocViews_1.66.3 [49] processx_3.8.0 rlang_1.1.0 MungeSumstats_1.6.0 [52] MAGMA.Celltyping_2.0.8 splines_4.2.1 rtracklayer_1.58.0 [55] rstatix_0.7.2 lazyeval_0.2.2 gargle_1.3.0 [58] broom_1.0.4 BiocManager_1.30.20 yaml_2.3.7 [61] reshape2_1.4.4 abind_1.4-5 GenomicFeatures_1.50.4 [64] backports_1.4.1 httpuv_1.6.9 RBGL_1.74.0 [67] usethis_2.1.6 tools_4.2.1 ggplotify_0.1.0 [70] ggplot2_3.4.2 ellipsis_0.3.2 ggdendro_0.1.23 [73] BiocGenerics_0.44.0 sessioninfo_1.2.2 Rcpp_1.0.10 [76] plyr_1.8.8 progress_1.2.2 zlibbioc_1.44.0 [79] purrr_1.0.1 RCurl_1.98-1.12 ps_1.7.4 [82] prettyunits_1.1.1 ggpubr_0.6.0 urlchecker_1.0.1 [85] S4Vectors_0.36.2 SummarizedExperiment_1.28.0 grr_0.9.5 [88] here_1.0.1 fs_1.6.1 magrittr_2.0.3 [91] data.table_1.14.8 gh_1.4.0 matrixStats_0.63.0 [94] pkgload_1.3.2 hms_1.1.3 patchwork_1.1.2 [97] mime_0.12 xtable_1.8-4 XML_3.99-0.14 [100] EWCE_1.7.4 IRanges_2.32.0 compiler_4.2.1 [103] biomaRt_2.54.1 tibble_3.2.1 crayon_1.5.2 [106] minqa_1.2.5 R.oo_1.25.0 htmltools_0.5.5 [109] ggfun_0.0.9 later_1.3.0 tidyr_1.3.0 [112] aplot_0.1.10 lubridate_1.9.2 DBI_1.1.3 [115] ExperimentHub_2.6.0 gprofiler2_0.2.1 dbplyr_2.3.2 [118] MASS_7.3-58.3 rappdirs_0.3.3 boot_1.3-28.1 [121] babelgene_22.9 Matrix_1.5-4 car_3.1-2 [124] piggyback_0.1.4 cli_3.6.1 R.methodsS3_1.8.2 [127] parallel_4.2.1 GenomicRanges_1.50.2 pkgconfig_2.0.3 [130] GenomicAlignments_1.34.1 plotly_4.10.1 xml2_1.3.3 [133] ggtree_3.6.2 stringdist_0.9.10 XVector_0.38.0 [136] BiocCheck_1.34.3 yulab.utils_0.0.6 callr_3.7.3 [139] stringr_1.5.0 VariantAnnotation_1.44.1 digest_0.6.31 [142] graph_1.76.0 Biostrings_2.66.0 HGNChelper_0.8.1 [145] tidytree_0.4.2 restfulr_0.0.15 curl_5.0.0 [148] shiny_1.7.4 Rsamtools_2.14.0 rjson_0.2.21 [151] nloptr_2.0.3 lifecycle_1.0.3 nlme_3.1-162 [154] jsonlite_1.8.4 carData_3.0-5 viridisLite_0.4.1 [157] limma_3.54.2 BSgenome_1.66.3 fansi_1.0.4 [160] pillar_1.9.0 lattice_0.21-8 homologene_1.4.68.19.3.27 [163] pkgbuild_1.4.0 KEGGREST_1.38.0 fastmap_1.1.1 [166] httr_1.4.5 googleAuthR_2.0.0 remotes_2.4.2 [169] interactiveDisplayBase_1.36.0 glue_1.6.2 RNOmni_1.0.1 [172] png_0.1-8 ewceData_1.7.1 BiocVersion_3.16.0 [175] bit_4.0.5 profvis_0.3.7 stringi_1.7.12 [178] blob_1.2.4 AnnotationHub_3.6.0 memoise_2.0.1 [181] dplyr_1.1.1 ape_5.7-1 ```
bschilder commented 1 year ago

Actually, seems to be occurring even in CTDs that worked fine before, e.g TabulaMuris_zebrafishGenes

bschilder commented 1 year ago

Manually inspecting the CTD, this doesn't actually seem to be true. All celltypes are indeed unique.

For Jiang2021:

  CTD_std <-  EWCE::standardise_ctd(ctd = CTD,  
                                               input_species = species, 
                                               output_species = "human",
                                               sctSpecies_origin = species_dict[[x]],
                                               dataset = x, 
                                               force_standardise = TRUE,
                                               keep_plots = FALSE)

Screenshot 2023-04-14 at 12 38 49

So either MAGMA.Celltyping is picking up some old files where this was once true, or there is a bug in the pipeline. Another possibility is that the CTD gets screwed up when passing through the extra standardisation procedure here. But in theory, the CTD should just be passed right back if it detects that it's already been standardised before.

https://github.com/neurogenomics/MAGMA_Celltyping/blob/349eeabcc29735fcc4b2fc4cedb4f22580f8ddba/R/prepare_quantile_groups.r#L32

bschilder commented 1 year ago

Seems to be working now after reprocessing CTDs