seandavi / GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
http://seandavi.github.io/GEOquery/
Other
88 stars 36 forks source link

getGEO recently did not return correct sample names instead return expression values w/ many parsing failures. #74

Closed zhangj5 closed 5 years ago

zhangj5 commented 6 years ago

Recently, I have been having issues with getGEO, not getting correct sample numbers. Seems an issue not reading the "header" line. Below are output from running a getGEO command. I have tried deleting the /tmp/RtmppX1Ykp/GSE1159_series_matrix.txt.gz file with no help.

exp<-getGEO("GSE1159",GSEMatrix = T) Found 1 file(s) GSE1159_series_matrix.txt.gz Using locally cached version: /tmp/RtmppX1Ykp/GSE1159_series_matrix.txt.gz Parsed with column specification: cols( .default = col_double(), 1053_at = col_character() ) See spec(...) for full column specifications. Using locally cached version of GPL96 found here: /tmp/RtmppX1Ykp/GPL96.soft Warning: 68 parsing failures. row col expected actual file 22216 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data 22217 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data 22218 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data 22219 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data 22220 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data ..... ....... .................. ......... ............ See problems(...) for more details.

Warning message: Duplicated column names deduplicated: '96.5' => '96.5_1' [51], '54.1' => '54.1_1' [64], '76.7' => '76.7_1' [86], '38.2' => '38.2_1' [87], '81.9' => '81.9_1' [94], '89.1' => '89.1_1' [99], '49.7' => '49.7_1' [100], '72.6' => '72.6_1' [116], '86.9' => '86.9_1' [118], '57.6' => '57.6_1' [119], '57.6' => '57.6_2' [122], '62' => '62_1' [123], '62.8' => '62.8_1' [136], '53' => '53_1' [137], '35.7' => '35.7_1' [146], '40.5' => '40.5_1' [148], '73.5' => '73.5_1' [150], '73.5' => '73.5_2' [151], '68' => '68_1' [156], '66.2' => '66.2_1' [164], '60.3' => '60.3_1' [165], '52.7' => '52.7_1' [172], '46.6' => '46.6_1' [175], '96.5' => '96.5_2' [179], '57.5' => '57.5_1' [191], '74' => '74_1' [193], '76.3' => '76.3_1' [194], '58.8' => '58.8_1' [200], '61.1' => '61.1_1' [210], '70.4' => '70.4_1' [212], '40.5' => '40.5_2' [220], '58.4' => '58.4_1' [221], '68.6' => '68.6_1' [225], '45.1' => '45.1_1' [226], '73.4' => '73.4_1' [227], '90' => '90_1' [230], '61.1' => '61.1_2' [231], '93.7' => '93.7_1' [234], [... truncated]

This problem occurs with all GSE dataset tested so far. What can be the problem? Thanks!

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3 LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel graphics grDevices utils datasets stats methods base other attached packages: [1] GEOquery_2.50.0 Biobase_2.42.0 BiocGenerics_0.28.0 colorout_1.2-0 usethis_1.4.0 devtools_2.0.1 loaded via a namespace (and not attached): [1] Rcpp_1.0.0 compiler_3.5.1 pillar_1.3.0 bindr_0.1.1 prettyunits_1.0.2 base64enc_0.1-3 remotes_2.0.2 [8] tools_3.5.1 testthat_2.0.1 digest_0.6.18 pkgbuild_1.0.2 pkgload_1.0.2 memoise_1.1.0 tibble_1.4.2 [15] debugme_1.1.0 pkgconfig_2.0.2 rlang_0.3.0.1 cli_1.0.1 rstudioapi_0.8 curl_3.2 yaml_2.2.0 [22] bindrcpp_0.2.2 xml2_1.2.0 withr_2.1.2 dplyr_0.7.8 hms_0.4.2 desc_1.2.0 fs_1.2.6 [29] tidyselect_0.2.5 rprojroot_1.3-2 glue_1.3.0 R6_2.3.0 processx_3.2.0 fansi_0.4.0 sessioninfo_1.1.1 [36] limma_3.38.2 tidyr_0.8.2.9000 readr_1.2.1 purrr_0.2.5 callr_3.0.0 magrittr_1.5 backports_1.1.2 [43] ps_1.2.1 assertthat_0.2.0 utf8_1.1.4 crayon_1.3.4

zhangj5 commented 6 years ago

Downgrading "readr" 1.2.1 (released a few days ago) to 1.1.1 solved the problem.