seandavi / GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
http://seandavi.github.io/GEOquery/
Other
88 stars 36 forks source link

getGEO broken on Windows (?) #118

Closed pfgherardini closed 2 years ago

pfgherardini commented 3 years ago

Hi,

Apologies for the vague bug report, but it seems to me that getGEO is currently (completely ?) broken on Windows? I keep getting

error reading from connection

from files that definitely exists and are readable. For instance

x <- GEOquery::getGEO("GSE58095")
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Found 1 file(s)
GSE58095_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE58nnn/GSE58095/matrix/GSE58095_series_matrix.txt.gz'
Content type 'application/x-gzip' length 18421721 bytes (17.6 MB)
downloaded 17.6 MB

File stored at: 
C:\Users\feder\AppData\Local\Temp\RtmpUX51VB/GPL10558.soft.gz
Error in readLines(con, 100) : error reading from the connection

Strangely I am also seeing this when I try to look at the offending file

/mnt/c/Users/feder/AppData/Local/Temp/RtmpUX51VB$ zcat GPL10558.soft.gz

gzip: GPL10558.soft.gz: invalid compressed data--format violated

This is not a temporary network issue as I have triggered this consistently and my network is working fine. I have recently updated to BioConductor 3.14 and I am pretty sure the exact same script I am using was working on the previous version.

By contrast everything is working ok on Linux.

This is my windows session info

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13     xml2_1.3.2          magrittr_2.0.1      BiocGenerics_0.40.0
 [5] hms_1.1.1           tidyselect_1.1.1    R6_2.5.1            rlang_0.4.11       
 [9] GEOquery_2.62.0     fansi_0.5.0         dplyr_1.0.7         tools_4.1.1        
[13] Biobase_2.54.0      data.table_1.14.2   xfun_0.26           R.oo_1.24.0        
[17] tinytex_0.34        utf8_1.2.2          DBI_1.1.1           ellipsis_0.3.2     
[21] assertthat_0.2.1    tibble_3.1.5        lifecycle_1.0.1     crayon_1.4.1       
[25] tidyr_1.1.4         tzdb_0.1.2          BiocManager_1.30.16 purrr_0.3.4        
[29] readr_2.0.2         R.utils_2.11.0      vctrs_0.3.8         curl_4.3.2         
[33] glue_1.4.2          limma_3.48.3        compiler_4.1.1      pillar_1.6.4       
[37] R.methodsS3_1.8.1   generics_0.1.0      pkgconfig_2.0.3 

and this is my Linux session info

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13     xml2_1.3.2          magrittr_2.0.1
 [4] BiocGenerics_0.38.0 hms_1.1.0           tidyselect_1.1.1
 [7] R6_2.5.0            rlang_0.4.11        GEOquery_2.60.0
[10] fansi_0.5.0         dplyr_1.0.7         tools_4.1.0
[13] parallel_4.1.0      Biobase_2.52.0      utf8_1.2.1
[16] cli_3.0.0           DBI_1.1.1           ellipsis_0.3.2
[19] assertthat_0.2.1    tibble_3.1.2        lifecycle_1.0.0
[22] crayon_1.4.1        BiocManager_1.30.16 purrr_0.3.4
[25] readr_1.4.0         tidyr_1.1.3         vctrs_0.3.8
[28] curl_4.3.2          glue_1.4.2          limma_3.48.3
[31] compiler_4.1.0      pillar_1.6.1        generics_0.1.0
[34] pkgconfig_2.0.3

Thanks,

Federico

QifengOu commented 3 years ago

I have the same problem.

QifengOu commented 3 years ago

Hallo!I have solved this problem!! It is possible that the mirror Settings are incorrect in Rstudio, so the file cannot be captured. When I change the mirror Settings, the function runs successfully.

You can try running code directly on R instead of using Rstudio.

seandavi commented 3 years ago

I am hopeful that my recent commit 27c64b4f7 fixes this. You can try the current github master or wait a couple of days and update the package from Bioconductor.

BiocManager::install('seandavi/GEOquery')
seandavi commented 2 years ago

I believe this has been resolved in 27c64b4.