rnabioco / clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://rnabioco.github.io/clustifyr/
MIT License
103 stars 14 forks source link

Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed #401

Open pauldeboissier opened 8 months ago

pauldeboissier commented 8 months ago

Dear colleagues,

I'm facing a new error I don't know how to overcome. I'm currently trying to load this dataset https://cells.ucsc.edu/?ds=ext-mouse-atlas in R using get_ucsc_reference() but it returns this error :

 [100%] Downloaded 47242005 bytes...
 [100%] Downloaded 3246770440 bytes...
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed

Can you help me to solve this issue please ? This is the code I'm using :

library(clustifyr)
mouse.embryo.ref <- get_ucsc_reference(cb_url = "https://cells.ucsc.edu/?ds=ext-mouse-atlas",
                                 cluster_col = "celltype_extended_atlas", 
                                 if_log = FALSE)

And this is my sessionInfo() :

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Mm.eg.db_3.14.0         AnnotationDbi_1.56.2        biomaRt_2.57.1              ggrepel_0.9.3               TCC_1.34.0                  ROC_1.70.0                  baySeq_2.28.0              
 [8] abind_1.4-5                 edgeR_3.36.0                limma_3.50.3                DESeq2_1.34.0               enrichR_3.2                 plotly_4.10.2               VennDiagram_1.7.3          
[15] futile.logger_1.4.3         gprofiler2_0.2.2            viper_1.28.0                dorothea_1.7.3              RColorBrewer_1.1-3          slingshot_2.2.1             TrajectoryUtils_1.2.0      
[22] princurve_2.1.6             heatmap3_1.1.9              singleseqgset_0.1.2.9000    Matrix_1.5-4.1              msigdbr_7.5.1               splatter_1.18.2             SingleCellExperiment_1.16.0
[29] SummarizedExperiment_1.24.0 Biobase_2.54.0              GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.4            BiocGenerics_0.40.0        
[36] MatrixGenerics_1.6.0        matrixStats_1.0.0           pheatmap_1.0.12             tidyr_1.3.0                 tibble_3.2.1                decoupleR_2.5.2             ggplot2_3.4.2              
[43] stringr_1.5.0               DT_0.28                     clustifyr_1.13.1            patchwork_1.1.2             SeuratObject_4.1.3          Seurat_4.3.0.1              dplyr_1.1.2                

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3         scattermore_1.2        R.methodsS3_1.8.2      bit64_4.0.5            knitr_1.43             R.utils_2.12.2         irlba_2.3.5.1          DelayedArray_0.20.0    data.table_1.14.8     
 [10] KEGGREST_1.34.0        RCurl_1.98-1.12        generics_0.1.3         cowplot_1.1.1          lambda.r_1.2.4         RSQLite_2.3.1          RANN_2.6.1             proxy_0.4-27           future_1.33.0         
 [19] bit_4.0.5              spatstat.data_3.0-1    xml2_1.3.5             httpuv_1.6.11          xfun_0.39              hms_1.1.3              babelgene_22.9         evaluate_0.21          promises_1.2.0.1      
 [28] fansi_1.0.4            progress_1.2.2         dbplyr_2.3.3           igraph_1.5.0           DBI_1.1.3              geneplotter_1.72.0     htmlwidgets_1.6.2      spatstat.geom_3.2-2    purrr_1.0.1           
 [37] ellipsis_0.3.2         crosstalk_1.2.0        backports_1.4.1        annotate_1.72.0        deldir_1.0-9           vctrs_0.6.3            ROCR_1.0-11            entropy_1.3.1          cachem_1.0.8          
 [46] withr_2.5.0            progressr_0.13.0       checkmate_2.2.0        sctransform_0.3.5      prettyunits_1.1.1      goftest_1.2-3          cluster_2.1.4          segmented_1.6-4        lazyeval_0.2.2        
 [55] crayon_1.5.2           genefilter_1.76.0      spatstat.explore_3.2-1 labeling_0.4.2         pkgconfig_2.0.3        nlme_3.1-162           rlang_1.1.1            globals_0.16.2         lifecycle_1.0.3       
 [64] miniUI_0.1.1.1         filelock_1.0.2         BiocFileCache_2.2.1    polyclip_1.10-4        lmtest_0.9-40          zoo_1.8-12             ggridges_0.5.4         png_0.1-8              viridisLite_0.4.2     
 [73] rjson_0.2.21           bitops_1.0-7           R.oo_1.25.0            KernSmooth_2.23-21     Biostrings_2.62.0      blob_1.2.4             parallelly_1.36.0      spatstat.random_3.1-5  scales_1.2.1          
 [82] memoise_2.0.1          magrittr_2.0.3         plyr_1.8.8             ica_1.0-3              zlibbioc_1.40.0        compiler_4.1.3         fitdistrplus_1.1-11    cli_3.6.1              XVector_0.34.0        
 [91] listenv_0.9.0          pbapply_1.7-2          formatR_1.14           MASS_7.3-60            tidyselect_1.2.0       stringi_1.7.12         yaml_2.3.7             locfit_1.5-9.8         bcellViper_1.30.0     
[100] fastmatch_1.1-3        tools_4.1.3            future.apply_1.11.0    rstudioapi_0.14        gridExtra_2.3          farver_2.1.1           Rtsne_0.16             digest_0.6.32          shiny_1.7.4.1         
[109] Rcpp_1.0.11            later_1.3.1            RcppAnnoy_0.0.21       WriteXLS_6.4.0         httr_1.4.7             kernlab_0.9-32         colorspace_2.1-0       XML_3.99-0.14          tensor_1.5            
[118] reticulate_1.30        splines_4.1.3          uwot_0.1.16            spatstat.utils_3.0-3   sp_2.0-0               xtable_1.8-4           jsonlite_1.8.7         futile.options_1.0.1   R6_2.5.1              
[127] pillar_1.9.0           htmltools_0.5.5        mime_0.12              glue_1.6.2             fastmap_1.1.1          BiocParallel_1.34.2    class_7.3-22           codetools_0.2-19       fgsea_1.26.0          
[136] utf8_1.2.3             lattice_0.21-8         spatstat.sparse_3.0-2  mixtools_2.0.0         curl_5.0.2             leiden_0.4.3           survival_3.5-5         rmarkdown_2.23         munsell_0.5.0         
[145] e1071_1.7-13           fastcluster_1.2.3      GenomeInfoDbData_1.2.7 reshape2_1.4.4         gtable_0.3.3 

Thank you

Paul

kriemo commented 8 months ago

It looks like they provide the matrix in a tar.gz format which our code does not support. I would recommend downloading their scanpy or anndata object and generating the average clusters using scanpy, or converting to an R object for use with Seurat or Singlecellexperiment.