seandavi / GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
http://seandavi.github.io/GEOquery/
Other
87 stars 36 forks source link

getGEO invalid row.names error #114

Closed sionaris closed 3 years ago

sionaris commented 3 years ago

Is there an issue with the latest GEOquery version for Windows ( http://bioconductor.org/checkResults/release/bioc-LATEST/GEOquery/ ), because I keep getting this error message about invalid row.names, which I had never encountered before. Code used to work in the previous Bioconductor version.

download = getGEO("GSE32603", GSEMatrix = TRUE)

Found 1 file(s) GSE32603_series_matrix.txt.gz Using locally cached version: C:\Users\s2071467\AppData\Local\Temp\Rtmpwr1vrd/GSE32603_series_matrix.txt.gz Rows: 35069 Columns: 249
-- Column specification ------------------------------------------------------------------------ Delimiter: "\t" chr (1): ID_REF dbl (248): GSM808102, GSM808103, GSM808104, GSM808105, GSM808106, GSM808107, GSM808108, GSM8...

i Use spec() to retrieve the full column specification for this data. i Specify the column types or set show_col_types = FALSE to quiet this message. Using locally cached version of GPL14668 found here: C:\Users\s2071467\AppData\Local\Temp\Rtmpwr1vrd/GPL14668.soft Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length
In addition: Warning message: One or more parsing issues, see problems() for details

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] readr_2.0.1 EnhancedVolcano_1.10.0 ggrepel_0.9.1 ggplot2_3.3.5
[5] preprocessCore_1.54.0 tidyr_1.1.3 stringr_1.4.0 genefu_2.24.2
[9] AIMS_1.24.0 e1071_1.7-8 iC10_1.5 iC10TrainingData_1.3.1 [13] impute_1.66.0 pamr_1.56.1 cluster_2.1.2 biomaRt_2.48.3
[17] survcomp_1.42.0 prodlim_2019.11.13 survival_3.2-13 GEOquery_2.60.0
[21] GO.db_3.13.0 openxlsx_4.2.4 sva_3.40.0 BiocParallel_1.26.2
[25] genefilter_1.74.0 mgcv_1.8-36 nlme_3.1-153 dplyr_1.0.7
[29] org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1 IRanges_2.26.0 S4Vectors_0.30.0
[33] Biobase_2.52.0 BiocGenerics_0.38.0 limma_3.48.3

loaded via a namespace (and not attached): [1] class_7.3-19 crayon_1.4.1 MASS_7.3-54 rlang_0.4.11
[5] XVector_0.32.0 extrafontdb_1.0 filelock_1.0.2 extrafont_0.17
[9] bit64_4.0.5 glue_1.4.2 vipor_0.4.5 tidyselect_1.1.1
[13] XML_3.99-0.8 proj4_1.0-10.1 SuppDists_1.1-9.5 xtable_1.8-4
[17] magrittr_2.0.1 cli_3.0.1 zlibbioc_1.38.0 rstudioapi_0.13
[21] maps_3.3.0 KEGGREST_1.32.0 tibble_3.1.4 listenv_0.8.0
[25] Biostrings_2.60.2 png_0.1-7 future_1.22.1 withr_2.4.2
[29] bitops_1.0-7 pillar_1.6.2 cachem_1.0.6 vctrs_0.3.8
[33] ellipsis_0.3.2 generics_0.1.0 lava_1.6.10 tools_4.1.1
[37] beeswarm_0.4.0 munsell_0.5.0 proxy_0.4-26 fastmap_1.1.0
[41] compiler_4.1.1 GenomeInfoDbData_1.2.6 edgeR_3.34.1 lattice_0.20-44
[45] utf8_1.2.2 BiocFileCache_2.0.0 scales_1.1.1 ash_1.0-15
[49] survivalROC_1.0.3 memoise_2.0.0 locfit_1.5-9.4 digest_0.6.27
[53] assertthat_0.2.1 rappdirs_0.3.3 Rttf2pt1_1.3.9 RSQLite_2.2.8
[57] future.apply_1.8.1 blob_1.2.2 splines_4.1.1 RCurl_1.98-1.5
[61] hms_1.1.0 colorspace_2.0-2 BiocManager_1.30.16 ggbeeswarm_0.6.0
[65] ggrastr_0.2.3 Rcpp_1.0.7 mclust_5.4.7 fansi_0.5.0
[69] tzdb_0.1.2 parallelly_1.28.1 R6_2.5.1 grid_4.1.1
[73] lifecycle_1.0.0 zip_2.2.0 curl_4.3.2 Matrix_1.3-4
[77] RColorBrewer_1.1-2 purrr_0.3.4 globals_0.14.0 codetools_0.2-18
[81] matrixStats_0.61.0 prettyunits_1.1.1 dbplyr_2.1.1 GenomeInfoDb_1.28.4
[85] gtable_0.3.0 DBI_1.1.1 httr_1.4.2 KernSmooth_2.23-20
[89] stringi_1.7.4 vroom_1.5.5 progress_1.2.2 annotate_1.70.0
[93] xml2_1.3.2 rmeta_3.0 ggalt_0.4.0 bit_4.0.4
[97] pkgconfig_2.0.3 bootstrap_2019.6

Thanks very much in advance.

(P.S.: I tried it with other GSE too (local and not local) and it didn't work again)

seandavi commented 3 years ago

Thanks for the report. I wasn't able to reproduce this one on my Mac, anyway. You mentioned that you tried another GSE, but it looks like the error is probably associated with the GPL, not the GSE. Could you try removing the locally cached GPL to see if that clears up the problem?

sionaris commented 3 years ago

Thanks for the quick response Sean. I removed the locally cached files, but I still get the same error. I've used the exact same line of code for the exact same GSE in the past and it was working. Same error occurs with GPL10558 (GSE55374) and GPL96 (GSE20181), which means it happens with other platforms as well.

I don't know if it's important, but size connection buffer errors also occur, but I have been dealing with these by increasing vroom connection size with "Sys.setenv("VROOM_CONNECTION_SIZE" = 1000 x previous_size)".

Yunuuuu commented 3 years ago

the same problem

Yunuuuu commented 3 years ago

It seems it only occurred in the newest GEOquery, I have found it mentioned in the stackoverflow. https://stackoverflow.com/questions/69247530/getgeofilename-leads-to-error-in-rownamesdf-x-value-value-inval

sionaris commented 3 years ago

Yes, that would be me answering that question yesterday in stackoverflow after some extensive online research, without being 100% sure. Today I tried version 2.58.0 of the package, in Bioconductor 3.12. It didn't work again. So when the issue is resolved here, I will update my answer in stackoverflow accordingly.

Yunuuuu commented 3 years ago

It's not the version of GEOquery, but the version of readr. I changed to the old version of GEOquery but the problems remain here, I finally solved it by running readr::local_edition(1) before using GEOquery which chosen the old version of readr for the file reading.

sionaris commented 3 years ago

Thanks for the suggestion Yunuuuu, it worked for me too after running this!

seandavi commented 3 years ago

This issue is fixed by the conversion to data.table in commit b81fe.

geteff commented 2 years ago

It's not the version of GEOquery, but the version of readr. I changed to the old version of GEOquery but the problems remain here, I finally solved it by running readr::local_edition(1) before using GEOquery which chosen the old version of readr for the file reading.

Thanks for the tips!

Fuli99 commented 1 year ago

It's not the version of GEOquery, but the version of readr. I changed to the old version of GEOquery but the problems remain here, I finally solved it by running readr::local_edition(1) before using GEOquery which chosen the old version of readr for the file reading.

Thanks!