rnabioco / clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://rnabioco.github.io/clustifyr/
MIT License
103 stars 14 forks source link

get_ucsc_reference : Can't find column `gene` in `.data`. #398

Closed pauldeboissier closed 11 months ago

pauldeboissier commented 11 months ago

Dear colleagues,

I'm currently working on single cell data from developping skeletal muscles and I am interesting in using clustifyr to annotate my cells according to the muscle cell atlas one can find on the UCSC database (here).

I am trying to use the function get_ucsc_reference() but it returns an error I don't know how to correct. Can you help me please ?

Here is my code :

library(clustifyr)
Sys.setenv(VROOM_CONNECTION_SIZE=5000072)
muscle.ref <- get_ucsc_reference(cb_url = "http://cells.ucsc.edu/?ds=muscle-cell-atlas", cluster_col = "cell_annotation")

And it returns this :

Rows: 22058 Columns: 8
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (5): cellId, cell_annotation, detailed_cell_annotation, sampleID, x10x_chemistry
dbl (3): nFeature_RNA, nCount_RNA, percent_mito

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
New names:
• `` -> `...1`
Rows: 21703 Columns: 22059
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr     (1): ...1
dbl (22058): hu_092618_AAACCTGAGGGAACGG, hu_092618_AAACCTGAGGGCTTCC, hu_092618_AAACCTGCAAACTGTC, hu_092618_AAACC...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Error in `tibble::column_to_rownames()`:
! Can't find column `gene` in `.data`.
Backtrace:
 1. clustifyr::get_ucsc_reference(...)
 2. tibble::column_to_rownames(mat, "gene")

This is my sessionInfo() :

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] clustifyr_1.13.0            plotly_4.10.2               gprofiler2_0.2.2            splatter_1.18.2            
 [5] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0 Biobase_2.54.0              GenomicRanges_1.46.1       
 [9] GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.4            BiocGenerics_0.40.0        
[13] MatrixGenerics_1.6.0        matrixStats_1.0.0           heatmap3_1.1.9              singleseqgset_0.1.2.9000   
[17] msigdbr_7.5.1               Matrix_1.5-4.1              pheatmap_1.0.12             tidyr_1.3.0                
[21] tibble_3.2.1                decoupleR_2.5.2             fgsea_1.26.0                DT_0.28                    
[25] enrichR_3.2                 ggplot2_3.4.2               stringr_1.5.0               patchwork_1.1.2            
[29] dplyr_1.1.2                 SeuratObject_4.1.3          Seurat_4.3.0.1             

loaded via a namespace (and not attached):
  [1] utf8_1.2.3             spatstat.explore_3.2-1 reticulate_1.30        tidyselect_1.2.0       htmlwidgets_1.6.2     
  [6] grid_4.1.3             BiocParallel_1.34.2    Rtsne_0.16             munsell_0.5.0          codetools_0.2-19      
 [11] ica_1.0-3              future_1.33.0          miniUI_0.1.1.1         withr_2.5.0            spatstat.random_3.1-5 
 [16] colorspace_2.1-0       progressr_0.13.0       knitr_1.43             rstudioapi_0.14        ROCR_1.0-11           
 [21] tensor_1.5             listenv_0.9.0          labeling_0.4.2         GenomeInfoDbData_1.2.7 polyclip_1.10-4       
 [26] bit64_4.0.5            farver_2.1.1           rprojroot_2.0.3        parallelly_1.36.0      vctrs_0.6.3           
 [31] generics_0.1.3         xfun_0.39              fastcluster_1.2.3      R6_2.5.1               locfit_1.5-9.8        
 [36] bitops_1.0-7           spatstat.utils_3.0-3   cachem_1.0.8           DelayedArray_0.20.0    vroom_1.6.3           
 [41] promises_1.2.0.1       scales_1.2.1           gtable_0.3.3           globals_0.16.2         processx_3.8.2        
 [46] goftest_1.2-3          rlang_1.1.1            splines_4.1.3          lazyeval_0.2.2         spatstat.geom_3.2-2   
 [51] checkmate_2.2.0        BiocManager_1.30.21    yaml_2.3.7             reshape2_1.4.4         abind_1.4-5           
 [56] crosstalk_1.2.0        backports_1.4.1        httpuv_1.6.11          tools_4.1.3            ellipsis_0.3.2        
 [61] jquerylib_0.1.4        RColorBrewer_1.1-3     ggridges_0.5.4         Rcpp_1.0.11            plyr_1.8.8            
 [66] zlibbioc_1.40.0        purrr_1.0.1            RCurl_1.98-1.12        ps_1.7.5               prettyunits_1.1.1     
 [71] deldir_1.0-9           pbapply_1.7-2          cowplot_1.1.1          zoo_1.8-12             ggrepel_0.9.3         
 [76] cluster_2.1.4          magrittr_2.0.3         data.table_1.14.8      scattermore_1.2        lmtest_0.9-40         
 [81] RANN_2.6.1             fitdistrplus_1.1-11    hms_1.1.3              mime_0.12              evaluate_0.21         
 [86] xtable_1.8-4           gridExtra_2.3          compiler_4.1.3         KernSmooth_2.23-21     crayon_1.5.2          
 [91] htmltools_0.5.5        entropy_1.3.1          tzdb_0.4.0             later_1.3.1            WriteXLS_6.4.0        
 [96] MASS_7.3-60            babelgene_22.9         readr_2.1.4            cli_3.6.1              parallel_4.1.3        
[101] igraph_1.5.0           pkgconfig_2.0.3        sp_2.0-0               spatstat.sparse_3.0-2  bslib_0.5.0           
[106] XVector_0.34.0         callr_3.7.3            digest_0.6.32          sctransform_0.3.5      RcppAnnoy_0.0.21      
[111] spatstat.data_3.0-1    rmarkdown_2.23         leiden_0.4.3           fastmatch_1.1-3        uwot_0.1.16           
[116] curl_5.0.1             shiny_1.7.4.1          rjson_0.2.21           lifecycle_1.0.3        nlme_3.1-162          
[121] jsonlite_1.8.7         desc_1.4.2             viridisLite_0.4.2      fansi_1.0.4            pillar_1.9.0          
[126] lattice_0.21-8         fastmap_1.1.1          httr_1.4.6             pkgbuild_1.4.2         survival_3.5-5        
[131] glue_1.6.2             remotes_2.4.2          png_0.1-8              bit_4.0.5              stringi_1.7.12        
[136] sass_0.4.6             irlba_2.3.5.1          future.apply_1.11.0 

Thank you

Paul

kriemo commented 11 months ago

Thanks for the bug report and reproducible example, I'll take a look.

kriemo commented 11 months ago

This should be fixed now. Note that when working with UCSC data you'll often need to add if_log = FALSE as many of these datasets are not log transformed. Also, the changes may require you to install the R.utils package (the function will let you know if this is needed).

You can pull the changes using BiocManager::install("clustifyr") if you are using the devel version of bioconductor (3.18). Otherwise you'll need to install from github BiocManager::install("rnabioco/clustifyr")

library(clustifyr)
muscle.ref <- get_ucsc_reference(cb_url = "http://cells.ucsc.edu/?ds=muscle-cell-atlas",
                                 cluster_col = "cell_annotation", 
                                 if_log = FALSE)
muscle.ref[1:10, 1:10]
#>                Adipocytes B/T/NK cells Endothelial 1 Endothelial 2
#> RP11-34P13.7  0.000000000 0.0022718677   0.000000000   0.000000000
#> FO538757.2    0.161376145 0.1205590627   0.124490648   0.118985682
#> AP006222.2    0.059927654 0.0342805549   0.052321337   0.034302594
#> RP4-669L17.10 0.000000000 0.0030280114   0.005698021   0.000000000
#> RP11-206L10.9 0.027536158 0.0342805549   0.028170877   0.007194276
#> FAM87B        0.005063302 0.0007578629   0.000000000   0.000000000
#> LINC00115     0.004221197 0.0105581675   0.006833740   0.014337163
#> FAM41C        0.020101179 0.0239711852   0.012493064   0.004801930
#> RP11-54O7.1   0.006745388 0.0007578629   0.011363759   0.001202646
#> SAMD11        0.008424650 0.0007578629   0.001142205   0.000000000
#>               Endothelial 3 Erythroblasts Fibroblasts 1 Fibroblasts 2
#> RP11-34P13.7   0.0020986367    0.00000000  0.0007285975   0.002662105
#> FO538757.2     0.0893720338    0.00000000  0.1080173330   0.146066738
#> AP006222.2     0.0451822228    0.06252036  0.0945147403   0.117359827
#> RP4-669L17.10  0.0036697289    0.00000000  0.0054515854   0.004180131
#> RP11-206L10.9  0.0223327900    0.00000000  0.0162663988   0.033698434
#> FAM87B         0.0005250722    0.00000000  0.0072621960   0.003800841
#> LINC00115      0.0104494159    0.00000000  0.0050890695   0.008342862
#> FAM41C         0.0052383567    0.00000000  0.0173414750   0.009475151
#> RP11-54O7.1    0.0026226084    0.00000000  0.0032745163   0.001902226
#> SAMD11         0.0005250722    0.00000000  0.0333304460   0.032961786
#>               Fibroblasts 3 Inflammatory macrophages and monocytes
#> RP11-34P13.7    0.002290296                            0.007698267
#> FO538757.2      0.115200167                            0.139694740
#> AP006222.2      0.074271739                            0.040883816
#> RP4-669L17.10   0.007424363                            0.010761057
#> RP11-206L10.9   0.032151670                            0.031941879
#> FAM87B          0.003719072                            0.001544402
#> LINC00115       0.007708818                            0.013814494
#> FAM41C          0.007993191                            0.037912046
#> RP11-54O7.1     0.004290011                            0.000000000
#> SAMD11          0.023510501                            0.006163348

Created on 2023-07-13 with reprex v2.0.2

pauldeboissier commented 11 months ago

Dear Kent,

Thank you for your help, it's now perfectly working. :1st_place_medal: