neurogenomics / MAGMA_Celltyping

Find causal cell-types underlying complex trait genetics
https://neurogenomics.github.io/MAGMA_Celltyping
71 stars 31 forks source link

WARNING: 228 columns (cell-types) have less than the expected number of quantile bins (10). #125

Closed AMCalejandro closed 1 year ago

AMCalejandro commented 1 year ago

1. Bug description

I am trying to understand why I cannot properly generate a quantile ( numberOfBins=4) from the Zeisel data accessible through EWCE pkg

To make sure I am understanding properly. Is it that there is not enough gene expression difference in the Zeisel dataset to generate quantiles based on higher to lower % of genes expressed?

I would appreciate some help understanding what is going on in the Zeisel data.

Console output

Standardising CellTypeDataset
Found 3 matrix types across 5 CTD levels.
Processing level: 1
Converting to sparse matrix.
Processing level: 2
Converting to sparse matrix.
Processing level: 3
Converting to sparse matrix.
Processing level: 4
Converting to sparse matrix.
Processing level: 5
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
Converting to sparse matrix.
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
Converting to sparse matrix.
Checking CTD: level 1
WARNING: 4 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 2
WARNING: 3 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 3
WARNING: 6 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 4
WARNING: 26 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 5
WARNING: 205 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 1
WARNING: 1 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 2
WARNING: 1 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 3
WARNING: 6 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 4
WARNING: 30 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 5
WARNING: 228 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

2. Reproducible example

Code

ctd <- get_ctd("ctd_Zeisel2018")
ctd_quant <- MAGMA.Celltyping::prepare_quantile_groups(ctd = ctd,
                                                  standardise = TRUE,
                                                  non121_strategy = "dbs",
                                                  input_species = "mouse",
                                                  output_species = "human",
                                                  numberOfBins = 4)

3. Session info

``` > sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.14.2 MAGMA.Celltyping_2.0.7 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4 [7] readr_2.1.2 tidyr_1.2.0 tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.2 here_1.0.1 loaded via a namespace (and not attached): [1] utf8_1.2.2 R.utils_2.12.0 tidyselect_1.1.2 lme4_1.1-30 RSQLite_2.2.16 [6] AnnotationDbi_1.59.1 htmlwidgets_1.5.4 grid_4.2.0 BiocParallel_1.31.12 munsell_0.5.0 [11] codetools_0.2-18 withr_2.5.0 colorspace_2.0-3 Biobase_2.57.1 filelock_1.0.2 [16] knitr_1.40 rstudioapi_0.14 orthogene_1.3.2 stats4_4.2.0 SingleCellExperiment_1.19.0 [21] ggsignif_0.6.3 gitcreds_0.1.1 labeling_0.4.2 MatrixGenerics_1.9.1 GenomeInfoDbData_1.2.8 [26] farver_2.1.1 bit64_4.0.5 rprojroot_2.0.3 vctrs_0.4.1 treeio_1.21.2 [31] generics_0.1.3 xfun_0.32 BiocFileCache_2.5.0 R6_2.5.1 GenomeInfoDb_1.33.5 [36] bitops_1.0-7 cachem_1.0.6 gridGraphics_0.5-1 DelayedArray_0.23.1 assertthat_0.2.1 [41] promises_1.2.0.1 BiocIO_1.7.1 scales_1.2.1 googlesheets4_1.0.1 gtable_0.3.1 [46] rlang_1.0.5 MungeSumstats_1.5.13 splines_4.2.0 rtracklayer_1.57.0 rstatix_0.7.0 [51] lazyeval_0.2.2 gargle_1.2.0 broom_1.0.1 BiocManager_1.30.18 yaml_2.3.5 [56] reshape2_1.4.4 abind_1.4-5 modelr_0.1.9 GenomicFeatures_1.49.6 backports_1.4.1 [61] httpuv_1.6.5 tools_4.2.0 ggplotify_0.1.0 ellipsis_0.3.2 ggdendro_0.1.23 [66] BiocGenerics_0.43.1 Rcpp_1.0.9 plyr_1.8.7 progress_1.2.2 zlibbioc_1.43.0 [71] RCurl_1.98-1.8 prettyunits_1.1.1 ggpubr_0.4.0 S4Vectors_0.35.3 SummarizedExperiment_1.27.2 [76] haven_2.5.1 fs_1.5.2 magrittr_2.0.3 gh_1.3.0 reprex_2.0.2 [81] googledrive_2.0.0 matrixStats_0.62.0 hms_1.1.2 patchwork_1.1.2 mime_0.12 [86] xtable_1.8-4 XML_3.99-0.10 EWCE_1.5.7 readxl_1.4.1 IRanges_2.31.2 [91] gridExtra_2.3 compiler_4.2.0 biomaRt_2.53.2 crayon_1.5.1 minqa_1.2.4 [96] R.oo_1.25.0 htmltools_0.5.3 ggfun_0.0.7 later_1.3.0 tzdb_0.3.0 [101] aplot_0.1.6 lubridate_1.8.0 DBI_1.1.3 ExperimentHub_2.5.0 gprofiler2_0.2.1 [106] dbplyr_2.2.1 MASS_7.3-58.1 rappdirs_0.3.3 boot_1.3-28 babelgene_22.3 [111] Matrix_1.4-1 car_3.1-0 piggyback_0.1.3 cli_3.3.0 R.methodsS3_1.8.2 [116] parallel_4.2.0 GenomicRanges_1.49.1 pkgconfig_2.0.3 GenomicAlignments_1.33.1 plotly_4.10.0 [121] xml2_1.3.3 ggtree_3.5.3 XVector_0.37.1 rvest_1.0.3 yulab.utils_0.0.5 [126] VariantAnnotation_1.43.3 digest_0.6.29 Biostrings_2.65.3 cellranger_1.1.0 HGNChelper_0.8.1 [131] tidytree_0.4.0 restfulr_0.0.15 curl_4.3.2 shiny_1.7.2 Rsamtools_2.13.4 [136] rjson_0.2.21 nloptr_2.0.3 lifecycle_1.0.1 nlme_3.1-159 jsonlite_1.8.0 [141] carData_3.0-5 viridisLite_0.4.1 limma_3.53.6 BSgenome_1.65.2 fansi_1.0.3 [146] pillar_1.8.1 lattice_0.20-45 homologene_1.4.68.19.3.27 KEGGREST_1.37.3 fastmap_1.1.0 [151] httr_1.4.4 googleAuthR_2.0.0 interactiveDisplayBase_1.35.0 glue_1.6.2 RNOmni_1.0.1 [156] png_0.1-7 ewceData_1.5.0 BiocVersion_3.16.0 bit_4.0.4 stringi_1.7.8 [161] blob_1.2.3 AnnotationHub_3.5.0 memoise_2.0.1 ape_5.6-2 ```
bschilder commented 1 year ago

This is actually the same issue described here:

130

The CTD is already converted to human orthologs, so set the input species to "human. or leave the default, and it will automatically infer the correct species.