seandavi / GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
http://seandavi.github.io/GEOquery/
Other
87 stars 36 forks source link

GEOquery parsing error on GSEs--off-by-one line number. #75

Closed seandavi closed 5 years ago

seandavi commented 5 years ago

See: https://support.bioconductor.org/p/115521/

Hello everyone,

I run the exmaple in GEOquery documentation but there are some errors. I can't get correct sampleNames.

I have tried several different datasets and all get error.But my friend use  older versions can get same result wtih documentation.I think there are some problem in my R enviroment.

Is this an problem in the GEOquery or my fault ?

my code:

library(GEOquery)
gsm <- getGEO("GSE2553", GSEMatrix=TRUE)
show(gsm)

output:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12599 features, 181 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: 0.2701103 0.3925373 ... 1.4989663 (181 total)   #error  The correct output is  GSM48681 GSM48682 ...

  varLabels: title geo_accession ... data_row_count (30 total)
  varMetadata: labelDescription
featureData
  featureNames: 2 3 ... NA.10693 (12599 total) 
  fvarLabels: ID PenAt ... Chimeric_Cluster_IDs (13 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL1977

> sessionInfo()

R version 3.5.1 (2018-07-02)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:

[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   

[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                             

[5] LC_TIME=Chinese (Simplified)_China.936   

attached base packages:

[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

[1] GEOquery_2.50.0     Biobase_2.42.0      BiocGenerics_0.28.0

loaded via a namespace (and not attached):

 [1] Rcpp_1.0.0         tidyr_0.8.2        crayon_1.3.4       dplyr_0.7.8        assertthat_0.2.0   R6_2.3.0         

 [7] magrittr_1.5       pillar_1.3.0       rlang_0.3.0.1      curl_3.2           bindrcpp_0.2.2     limma_3.38.2     

[13] xml2_1.2.0         tools_3.5.1        readr_1.2.1        glue_1.3.0         purrr_0.2.5        hms_0.4.2         

[19] compiler_3.5.1     pkgconfig_2.0.2    BiocManager_1.30.4 tidyselect_0.2.5   bindr_0.1.1        tibble_1.4.2
zhangj5 commented 5 years ago

see https://github.com/seandavi/GEOquery/issues/74

seandavi commented 5 years ago

This behavior is almost certainly due to a non-backwards compatible made between readr 1.1.1 and 1.2.1. See tidyverse/readr#923.

rikrdo89 commented 5 years ago

I have the same issue with GEOquery using the GSE2553 dataset. @seandavi have you found a fix or workaround to this?

code below:

> gse2553 <- getGEO('GSE2553', GSEMatrix = TRUE, destdir = "./gse2553") Found 1 file(s) GSE2553_series_matrix.txt.gz Using locally cached version: ./gse2553/GSE2553_series_matrix.txt.gz Parsed with column specification: cols( .default = col_double() ) See spec(...) for full column specifications. Using locally cached version of GPL1977 found here: ./gse2553/GPL1977.soft > show(gse2553) $GSE2553_series_matrix.txt.gz ExpressionSet (storageMode: lockedEnvironment) assayData: 12599 features, 181 samples element names: exprs protocolData: none phenoData sampleNames: 0.2701103 0.3925373 ... 1.4989663 (181 total) varLabels: title geo_accession ... data_row_count (30 total) varMetadata: labelDescription featureData featureNames: 2 3 ... 12600 (12599 total) fvarLabels: ID PenAt ... Chimeric_Cluster_IDs (13 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' Annotation: GPL1977 > sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] GEOquery_2.50.0 Biobase_2.42.0 BiocGenerics_0.28.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.0 rstudioapi_0.8 bindr_0.1.1 xml2_1.2.0 magrittr_1.5 hms_0.4.2
[7] tidyselect_0.2.5 R6_2.3.0 rlang_0.3.0.1 dplyr_0.7.8 tools_3.5.1 yaml_2.2.0
[13] assertthat_0.2.0 tibble_1.4.2 crayon_1.3.4 bindrcpp_0.2.2 BiocManager_1.30.4 purrr_0.2.5
[19] readr_1.2.1 tidyr_0.8.2 curl_3.2 glue_1.3.0 limma_3.38.2 compiler_3.5.1
[25] pillar_1.3.0 pkgconfig_2.0.2

seandavi commented 5 years ago

This is a confirmed bug in readr. The tidyverse team is working on a fix. Until then, downgrading to readr 1.1 should fix the problem.

rikrdo89 commented 5 years ago

I use the package versions to downgrade to readr1.1.1 and it fixed the issue. thanks a lot!