seandavi / GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
http://seandavi.github.io/GEOquery/
Other
87 stars 36 forks source link

Recently, I have been having issues with getGEO #88

Closed swiftiebirth13 closed 5 years ago

swiftiebirth13 commented 5 years ago

Recently, I have been having issues with getGEO, So far, I have not found the bad influence it has brought me. But I want to know why there is such a warning, I have not found a solution, can you help me? I don't know what this has to do with the “readr” package. gset <- getGEO('GSE76275', destdir=".", AnnotGPL = T, getGPL = T)

Found 1 file(s) GSE76275_series_matrix.txt.gz Using locally cached version: ./GSE76275_series_matrix.txt.gz Parsed with column specification: cols( .default = col_double(), ID_REF = col_character() ) See spec(...) for full column specifications. Using locally cached version of GPL570 found here: ./GPL570.annot.gz Warning: 62 parsing failures. row col expected actual file 54614 Platform_SPOTID 1/0/T/F/TRUE/FALSE --Control literal data 54615 Platform_SPOTID 1/0/T/F/TRUE/FALSE --Control literal data 54616 Platform_SPOTID 1/0/T/F/TRUE/FALSE --Control literal data 54617 Platform_SPOTID 1/0/T/F/TRUE/FALSE --Control literal data 54618 Platform_SPOTID 1/0/T/F/TRUE/FALSE --Control literal data ..... ............... .................. ......... ............ See problems(...) for more details.

lizhongliu1996 commented 5 years ago

I tried code below, but it doesn't work with 65 parsing failures

library(GEOquery)
test <- getGEO(GEO = "GSE43414", destdir = getwd())

65 parsing failures. row col expected actual file 485513 SPOT_ID 1/0/T/F/TRUE/FALSE rs10796216 literal data 485514 SPOT_ID 1/0/T/F/TRUE/FALSE rs715359 literal data 485515 SPOT_ID 1/0/T/F/TRUE/FALSE rs1040870 literal data 485516 SPOT_ID 1/0/T/F/TRUE/FALSE rs10936224 literal data 485517 SPOT_ID 1/0/T/F/TRUE/FALSE rs213028 literal data ...... ....... .................. .......... ............ See problems(...) for more details.

the GPL13534.soft file is 206.1MB, but the test object is only 802.7kb with 0 features

$GSE43414_series_matrix.txt.gz ExpressionSet (storageMode: lockedEnvironment) assayData: 0 features, 696 samples element names: exprs protocolData: none phenoData sampleNames: GSM1068821 GSM1068822 ... GSM1069516 (696 total) varLabels: title geo_accession ... tissue_code:ch1 (65 total) varMetadata: labelDescription featureData featureNames: fvarLabels: ID Name ... SPOT_ID (37 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' pubMedIds: 23631413 Annotation: GPL13534

here is my sessionInfo

sessionInfo()

GEOquery 2.52.0 readr 1.3.1

seandavi commented 5 years ago

@swiftiebirth13, the parsing problems are just warnings that you can likely ignore. The parsing failures do have to do with how readr guesses column names and I use readr because it is very fast.

@lizhongliu1996, unfortunately the lack of features has to do with the fact that the original submission did not include processed data (check the individual samples to see--there is no data table included). To get access to the data, you will need to determine which of the several supplemental files includes the data, parse those supplemental files yourself, and then construct the Bioconductor objects yourself. There really isn't anything that GEOquery can do with such supplemental files since they can be in any format and include any information. That said, basic R functions can often read these files easily since they are usually CSV, TSV, or the like.

latifizadehhabib commented 3 years ago

Thank you for the thought. Would you please explain me more how to do it in Rsudio? Specifically for my data (GSE81540).

Appreciate your help.