Closed zh-zhang1984 closed 3 years ago
Hi, @zh-zhang1984. I'm not able to reproduce the error. Can you double-check that you are using the latest version of R and GEOquery? If so, can you provide the output of sessionInfo()
after loading GEOquery?
Hi, @zh-zhang1984. I'm not able to reproduce the error. Can you double-check that you are using the latest version of R and GEOquery? If so, can you provide the output of
sessionInfo()
after loading GEOquery? Hi, I have the same problem. sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages: [1] GEOquery_2.60.0 Biobase_2.52.0 BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] xml2_1.3.2 magrittr_2.0.1 hms_1.1.0
[4] bit_4.0.4 tidyselect_1.1.1 R6_2.5.0
[7] rlang_0.4.11 fansi_0.5.0 dplyr_1.0.7
[10] tools_4.1.0 vroom_1.5.4 utf8_1.2.2
[13] ellipsis_0.3.2 bit64_4.0.5 tibble_3.1.3
[16] lifecycle_1.0.0 crayon_1.4.1 BiocManager_1.30.16
[19] purrr_0.3.4 readr_2.0.0 tzdb_0.1.2
[22] tidyr_1.1.3 vctrs_0.3.8 curl_4.3.2
[25] glue_1.4.2 limma_3.48.1 compiler_4.1.0
[28] pillar_1.6.2 generics_0.1.0 pkgconfig_2.0.3
do you know why?thanks
My suspicion is that the files that you have on your computer (notice that GEOquery is using a cached version of the file) are corrupted. Can you remove the files:
/Users/zhang/Documents/2021/singleCellRNA/Data/GSE131761_series_matrix.txt.gz
/Users/zhang/Documents/2020/GEOsepsis/Data/GSE139913_series_matrix.txt.gz
and try one more time? Sorry for the inconvenience.
I googled this problem, there is a same suspicion. Someone Else did and the download worked, but I failed. I had cleared the folder and downloaded it again, but it still failed。 `Found 1 file(s) GSE131761_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE131nnn/GSE131761/matrix/GSE131761_series_matrix.txt.gz' Content type 'application/x-gzip' length 31912319 bytes (30.4 MB) downloaded 30.4 MB
Error in parseGSEMatrix(destfile, destdir = destdir, AnnotGPL = AnnotGPL, : parsing failed--expected only one '!series_data_table_begin'` do you know why?
I tried to download another GSE data, but still failed.Thanks.
I am having the same issue. Ran sessionInfo, loaded the library and ran the get call (after previously receiving errors about VROOM_CONNECTION_SIZE):
library(GEOquery) Sys.setenv(VROOM_CONNECTION_SIZE=100000) gset <- getGEO('GSE2990', GSEMatrix = TRUE, getGPL = FALSE)
error message: Found 1 file(s) GSE2990_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2990/matrix/GSE2990_series_matrix.txt.gz' Content type 'application/x-gzip' length 16680570 bytes (15.9 MB) downloaded 15.9 MB
Error in parseGSEMatrix(destfile, destdir = destdir, AnnotGPL = AnnotGPL, : parsing failed--expected only one '!series_data_table_begin'
full output:
sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 14393)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] locfit_1.5-9.4 Rcpp_1.0.7 lattice_0.20-44 png_0.1-7
[5] Biostrings_2.60.2 assertthat_0.2.1 utf8_1.2.2 R6_2.5.1
[9] GenomeInfoDb_1.28.1 stats4_4.1.0 RSQLite_2.2.7 httr_1.4.2
[13] ggplot2_3.3.5 pillar_1.6.2 zlibbioc_1.38.0 rlang_0.4.11
[17] rstudioapi_0.13 annotate_1.70.0 blob_1.2.2 S4Vectors_0.30.0
[21] Matrix_1.3-4 splines_4.1.0 BiocParallel_1.26.1 geneplotter_1.70.0
[25] RCurl_1.98-1.4 bit_4.0.4 munsell_0.5.0 DelayedArray_0.18.0
[29] compiler_4.1.0 pkgconfig_2.0.3 BiocGenerics_0.38.0 tidyselect_1.1.1
[33] KEGGREST_1.32.0 SummarizedExperiment_1.22.0 tibble_3.1.3 GenomeInfoDbData_1.2.6
[37] IRanges_2.26.0 matrixStats_0.60.0 XML_3.99-0.7 fansi_0.5.0
[41] crayon_1.4.1 dplyr_1.0.7 bitops_1.0-7 grid_4.1.0
[45] xtable_1.8-4 gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1
[49] magrittr_2.0.1 scales_1.1.1 cachem_1.0.5 XVector_0.32.0
[53] genefilter_1.74.0 ellipsis_0.3.2 vctrs_0.3.8 generics_0.1.0
[57] RColorBrewer_1.1-2 tools_4.1.0 bit64_4.0.5 Biobase_2.52.0
[61] glue_1.4.2 DESeq2_1.32.0 purrr_0.3.4 MatrixGenerics_1.4.2
[65] parallel_4.1.0 fastmap_1.1.0 survival_3.2-11 AnnotationDbi_1.54.1
[69] colorspace_2.0-2 BiocManager_1.30.16 GenomicRanges_1.44.0 memoise_2.0.0
library(GEOquery) Loading required package: Biobase Loading required package: BiocGenerics Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply,
parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval,
evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Setting options('download.file.method.GEOquery'='auto') Setting options('GEOquery.inmemory.gpl'=FALSE)
Sys.setenv(VROOM_CONNECTION_SIZE=100000) gset <- getGEO('GSE2990', GSEMatrix = TRUE, getGPL = FALSE) Found 1 file(s) GSE2990_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2990/matrix/GSE2990_series_matrix.txt.gz' Content type 'application/x-gzip' length 16680570 bytes (15.9 MB) downloaded 15.9 MB
Error in parseGSEMatrix(destfile, destdir = destdir, AnnotGPL = AnnotGPL, : parsing failed--expected only one '!series_data_table_begin'
sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 14393)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages: [1] parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] GEOquery_2.60.0 Biobase_2.52.0 BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] locfit_1.5-9.4 Rcpp_1.0.7 lattice_0.20-44 tidyr_1.1.3
[5] png_0.1-7 Biostrings_2.60.2 assertthat_0.2.1 utf8_1.2.2
[9] R6_2.5.1 GenomeInfoDb_1.28.1 stats4_4.1.0 RSQLite_2.2.7
[13] httr_1.4.2 ggplot2_3.3.5 pillar_1.6.2 zlibbioc_1.38.0
[17] rlang_0.4.11 curl_4.3.2 rstudioapi_0.13 annotate_1.70.0
[21] blob_1.2.2 S4Vectors_0.30.0 Matrix_1.3-4 splines_4.1.0
[25] BiocParallel_1.26.1 readr_2.0.1 geneplotter_1.70.0 RCurl_1.98-1.4
[29] bit_4.0.4 munsell_0.5.0 DelayedArray_0.18.0 compiler_4.1.0
[33] pkgconfig_2.0.3 tidyselect_1.1.1 KEGGREST_1.32.0 SummarizedExperiment_1.22.0
[37] tibble_3.1.3 GenomeInfoDbData_1.2.6 IRanges_2.26.0 matrixStats_0.60.0
[41] XML_3.99-0.7 fansi_0.5.0 tzdb_0.1.2 crayon_1.4.1
[45] dplyr_1.0.7 bitops_1.0-7 grid_4.1.0 xtable_1.8-4
[49] gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1
[53] scales_1.1.1 vroom_1.5.4 cachem_1.0.5 XVector_0.32.0
[57] genefilter_1.74.0 limma_3.48.3 xml2_1.3.2 ellipsis_0.3.2
[61] vctrs_0.3.8 generics_0.1.0 RColorBrewer_1.1-2 tools_4.1.0
[65] bit64_4.0.5 glue_1.4.2 DESeq2_1.32.0 purrr_0.3.4
[69] hms_1.1.0 MatrixGenerics_1.4.2 fastmap_1.1.0 survival_3.2-11
[73] AnnotationDbi_1.54.1 colorspace_2.0-2 BiocManager_1.30.16 GenomicRanges_1.44.0
[77] memoise_2.0.0
also got the same error as zh-zhang1984
getGEO("GSE131761", AnnotGPL = T,GSEMatrix = T) Found 1 file(s) GSE131761_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE131nnn/GSE131761/matrix/GSE131761_series_matrix.txt.gz' Content type 'application/x-gzip' length 31912319 bytes (30.4 MB) downloaded 30.4 MB
Error in parseGSEMatrix(destfile, destdir = destdir, AnnotGPL = AnnotGPL, : parsing failed--expected only one '!series_data_table_begin'
I face the same error after getting "Error: The size of the connection buffer (262144) was not large enough" issue. I found out that increasing the VROOM_CONNECTION_SIZE doubling it each time the error rises, instead of adding a very big number, avoids the "Error in parseGSEMatrix" error. Te exact command is the following:
Sys.setenv("VROOM_CONNECTION_SIZE" = 262144 * 2)
I hope this helps you if it is not too late. Best regards.
Should be fixed in b81fe0aa.
The following code works well previously, but recently I found there is an error consistently exist and I cannot download these files now;