seandavi / GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
http://seandavi.github.io/GEOquery/
Other
88 stars 36 forks source link

'value' length must equal sample number in AssayData #108

Closed ning-y closed 3 years ago

ning-y commented 3 years ago

To replicate:

> getGEO("GSE123763")

Full error message:

Error in `sampleNames<-`(`*tmp*`, value = c("1", "2", "3", "4", "5", "6",  : 
  'value' length (24) must equal sample number in AssayData (0)
In addition: Warning message:
Missing column names filled in: 'X1' [1] 

Session info:

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_SG.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_SG.UTF-8        LC_COLLATE=en_SG.UTF-8    
 [5] LC_MONETARY=en_SG.UTF-8    LC_MESSAGES=en_SG.UTF-8   
 [7] LC_PAPER=en_SG.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_SG.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.58.0     Biobase_2.50.0      BiocGenerics_0.36.0

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13  xml2_1.3.2       magrittr_2.0.1   hms_1.0.0       
 [5] tidyselect_1.1.0 R6_2.5.0         rlang_0.4.10     fansi_0.4.2     
 [9] dplyr_1.0.4      tools_4.0.4      utf8_1.1.4       cli_2.3.1       
[13] DBI_1.1.1        ellipsis_0.3.1   assertthat_0.2.1 tibble_3.1.0    
[17] lifecycle_1.0.0  crayon_1.4.1     purrr_0.3.4      readr_1.4.0     
[21] tidyr_1.1.2      ps_1.5.0         vctrs_0.3.6      curl_4.3        
[25] glue_1.4.2       limma_3.46.0     compiler_4.0.4   pillar_1.5.0    
[29] generics_0.1.0   pkgconfig_2.0.3 
seandavi commented 3 years ago

Thanks, @ning-y, for the report. I'll fix the error, but it won't help you much. The processed data were not submitted to GEO, so you'll need to grab the supplemental files and then hand-import them.

ning-y commented 3 years ago

@seandavi Thanks for the quick reply! The confirmation is good enough for me

ning-y commented 3 years ago

For others coming across this issue, it might be sufficient to use reutils if you do not need the full range of information that comes with GEOquery. For example, I was just trying to get a vector of GSMs for a GSE. That can be accomplished with

esearch("GSE35126", db="gds") %>%
  uid() %>% esummary(db="gds") %>% content("xml") %>%
  getNodeSet("//DocumentSummary[./Accession='GSE35126']") %>% first() %>%
  getNodeSet("//Sample/Accession") %>% xmlValue()
chiwwong commented 3 years ago

I am interested in getting the title column and some associated characteristics columns. I tried getGEO("GSE139204", GSEMatrix=TRUE, destdir = paste0(current_dir, "/work")) for GSE139204, but it does not work today.

However, I can see the series_matrix file is available online. Do you have suggestions on how to parse the downloaded series_matrix file? Please advise what to do. Thanks.

ning-y commented 3 years ago

Hi @chiwwong, I do not know off the top of my head what information reutils::esearch returns, but it may be worth looking through its XML to see if the information you want is available there.

library(reutils)
library(magrittr)  # for the pipes (%>%)
esearch("GSE139204", db="gds") %>%
  uid() %>% esummary(db="gds") %>% content("xml")

If it's available, you should be able to write the xpath query to extract them. Note that reutils is a different R/Bioconductor package than GEOquery. GEOquery is great for somethings, but for metadata reutils is usually sufficient, in my opinion.

seandavi commented 3 years ago

This is fixed in b81fe0aaa63cd0667858b243d85bed2577302da3.