Closed moaraj closed 6 years ago
Thanks for the report. Can you drop in the output of sessionInfo()
after loading GEOquery?
it seems I am having the same problem. I can no longer download files using getGEO:
getGEO("GSE20986") https://ftp.ncbi.nlm.nih.gov/geo/series/GSE20nnn/GSE20986/matrix/ OK Found 2 file(s) /geo/series/GSE20nnn/GSE20986/
downloaded 0 bytes
Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", : cannot download all files In addition: Warning message: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", : URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE20nnn/GSE20986/matrix//geo/series/GSE20nnn/GSE20986/': status was '404 Not Found'
sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X El Capitan 10.11.6
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsubread_1.24.2 calibrate_1.7.2 MASS_7.3-48 WebGestaltR_0.1.1
[5] gageData_2.12.0 gage_2.24.0 RColorBrewer_1.1-2 ggfortify_0.4.1
[9] gplots_3.0.1 limma_3.30.13 simpleaffy_2.50.0 gcrma_2.46.0
[13] genefilter_1.56.0 affy_1.52.0 oligo_1.38.0 Biostrings_2.42.1
[17] XVector_0.14.1 IRanges_2.8.2 S4Vectors_0.12.2 oligoClasses_1.36.0
[21] GEOquery_2.40.0 Biobase_2.34.0 BiocGenerics_0.20.0 BiocInstaller_1.24.0
[25] DT_0.2 plotly_4.7.1 ggplot2_2.2.1.9000 rhandsontable_0.3.5
[29] shiny_1.0.5
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-7 doParallel_1.0.11
[4] httr_1.3.1 GenomeInfoDb_1.10.3 tools_3.3.2
[7] R6_2.2.2 affyio_1.44.0 KernSmooth_2.23-15
[10] DBI_0.7 lazyeval_0.2.1 colorspace_1.3-2
[13] gridExtra_2.3 bit_1.1-12 preprocessCore_1.36.0
[16] graph_1.52.0 pkgmaker_0.22 caTools_1.17.1
[19] scales_0.5.0 stringr_1.2.0 digest_0.6.14
[22] pkgconfig_2.0.1 htmltools_0.3.6 htmlwidgets_0.9
[25] rlang_0.1.6 RSQLite_2.0 bindr_0.1
[28] jsonlite_1.5 gtools_3.5.0 dplyr_0.7.4
[31] RCurl_1.95-4.10 magrittr_1.5 Matrix_1.2-12
[34] Rcpp_0.12.14 munsell_0.4.3 stringi_1.1.6
[37] yaml_2.1.16 SummarizedExperiment_1.4.0 zlibbioc_1.20.0
[40] plyr_1.8.4 grid_3.3.2 affxparser_1.46.0
[43] blob_1.1.0 gdata_2.18.0 lattice_0.20-35
[46] splines_3.3.2 annotate_1.52.1 KEGGREST_1.14.1
[49] GenomicRanges_1.26.4 rjson_0.2.15 codetools_0.2-15
[52] XML_3.98-1.9 glue_1.2.0 data.table_1.10.4-3
[55] png_0.1-7 httpuv_1.3.5 foreach_1.4.4
[58] PythonInR_0.1-3 gtable_0.2.0 purrr_0.2.4
[61] tidyr_0.7.2 assertthat_0.2.0 pack_0.1-1
[64] mime_0.5 xtable_1.8-2 ff_2.2-13
[67] rsconnect_0.8.5 survival_2.41-3 viridisLite_0.2.0
[70] tibble_1.4.1 iterators_1.0.9 AnnotationDbi_1.36.2
[73] registry_0.5 memoise_1.1.0 bindrcpp_0.2
Thanks!
R-3.3.2 is pretty old and GEOquery has been update a few dozen times since 2.40.0. Could you update your R and Bioconductor versions and try again? This problem has been fixed in recent GEOquery versions.
I'm having the same issue, I'm noting you say that there are newer versions of the package since 2.40.0, but I tried a fresh start of it from bioconductor and used the latest version of R in other computers and the version I get is 2.40.0
gse=getGEO("GSE106977",GSEMatrix=T)
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE106nnn/GSE106977/matrix/
OK
Found 2 file(s)
/geo/series/GSE106nnn/GSE106977/
Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
cannot open destfile 'C:\Users\Marti\AppData\Local\Temp\RtmpOedWoI//geo/series/GSE106nnn/GSE106977', reason 'No such file or directory'
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GEOquery_2.40.0 Biobase_2.34.0 BiocGenerics_0.20.0
[4] BiocInstaller_1.24.0
loaded via a namespace (and not attached):
[1] httr_1.3.1 R6_2.2.2 tools_3.3.2 RCurl_1.95-4.10
[5] bitops_1.0-6 XML_3.98-1.9
>
After debuggin a while I found that there should be 1 file instead of two and there is some issue with the "getAndParseGSEMatrices" that was fixed adding b=b[-1]
in the function below
getAndParseGSEMatrices=function (GEO, destdir, AnnotGPL, getGPL = TRUE)
{
GEO <- toupper(GEO)
stub = gsub("\\d{1,3}$", "nnn", GEO, perl = TRUE)
gdsurl <- "https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/"
b = getDirListing(sprintf(gdsurl, stub, GEO))
b=b[-1] #This one
message(sprintf("Found %d file(s)", length(b)))
ret <- list()
for (i in 1:length(b)) {
message(b[i])
destfile = file.path(destdir, b[i])
if (file.exists(destfile)) {
message(sprintf("Using locally cached version: %s",
destfile))
}
else {
download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",
stub, GEO, b[i]), destfile = destfile, mode = "wb",
method = getOption("download.file.method.GEOquery"))
}
ret[[b[i]]] <- parseGSEMatrix(destfile, destdir = destdir,
AnnotGPL = AnnotGPL, getGPL = getGPL)$eset
}
return(ret)
}
I will request a pull but not sure if it works with the whole package
Thank you for all your work Sean
best regards!
Martín
The issue is that you cannot get a newer version of GEOquery without upgrading R. Your R version is 3.3.2, which is tied to an outdated version of Bioconductor and, therefore, GEOquery. Upgrading R to 3.4.x (where x is the minor version, any number) and then reinstalling packages is the way to get GEOquery 2.46.13 (as of today, the latest released version).
I am really sorry for the inconvenience. Just to be complete, the reason this error comes up is because of changes in infrastructure (a transition to https) that occurred at NCBI GEO last year. They were addressed at the time, but not in older GEOquery versions.
Thanks Sean for all your help! I will try that but I'd like to point out that my GEOquery was working up until last week and I have been using this version successfully after the https change (I remember when that happened). I'll try updating though! Thanks!
Sorry to bother you Sean, but In other computers we have newer versions of R and are having the same problem. I agree that it might be some changes in the infrastructure of NCBI GEO, because the same context was working fine a week ago for us too. This is why I created the post, because after doing all you said, we couldn't find any workaround, I have not worked in the development of any package so I didn't know the proper way to issue a problem, but now I've learned, sorry for that!!
Here the sessionInfo() of other computer with R 3.4.0. I also tried the package in a Ubuntu machine with R 3.4.3 and GEOquery 2.46 and is working perfectly
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Argentina.1252 LC_CTYPE=Spanish_Argentina.1252
[3] LC_MONETARY=Spanish_Argentina.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Argentina.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GEOquery_2.42.0 Biobase_2.36.2 BiocGenerics_0.22.1
loaded via a namespace (and not attached):
[1] httr_1.3.1 compiler_3.4.0 R6_2.2.2 RCurl_1.95-4.10
[5] bitops_1.0-6 XML_3.98-1.9
Thanks for the update. So, can you let me know if the most recent version of GEOquery works for you? That is GEOquery version 2.46.13. If that does not work for you, can you drop in the error and output of sessionInfo()
again? Sorry for asking, but I got a little confused with the discussion above.
As for the package development process, everyone is a bit different. Your contributions are very much appreciated; bug reports and the discussion around them are key to ensuring that the project continues to meet peoples' needs.
Error Message indicates URL being accessed by GEOquery to download matrix file
For Example GSE69967 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE69nnn/GSE69967/matrix//geo/series/GSE69nnn/GSE69967/'
When it should be 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE69nnn/GSE69967/matrix/GSE69967_series_matrix.txt.gz'
download.file(url = 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE69nnn/GSE69967/matrix/GSE69967_series_matrix.txt.gz', destfile = "GSE69967_series_matrix.txt.gz")