seandavi / GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
http://seandavi.github.io/GEOquery/
Other
87 stars 36 forks source link

GSE2193 parsing failure #87

Closed seandavi closed 3 years ago

seandavi commented 5 years ago

gse = getGEO('GSE2193') Found 5 file(s) GSE2193-GPL1823_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2193/matrix/GSE2193-GPL1823_series_matrix.txt.gz' Content type 'application/x-gzip' length 2401863 bytes (2.3 MB)

downloaded 2.3 MB

Parsed with column specification: cols( .default = col_double() ) See spec(...) for full column specifications. File stored at: /var/folders/hq/pzgtdx7j55j0g7r4647vqzrr2yvxz9/T//RtmpdX3Qw9/GPL1823.soft GSE2193-GPL1824_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2193/matrix/GSE2193-GPL1824_series_matrix.txt.gz' Content type 'application/x-gzip' length 2111644 bytes (2.0 MB)

downloaded 2.0 MB

Parsed with column specification: cols( .default = col_double() ) See spec(...) for full column specifications. File stored at: /var/folders/hq/pzgtdx7j55j0g7r4647vqzrr2yvxz9/T//RtmpdX3Qw9/GPL1824.soft GSE2193-GPL1825_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2193/matrix/GSE2193-GPL1825_series_matrix.txt.gz' Content type 'application/x-gzip' length 2424453 bytes (2.3 MB)

downloaded 2.3 MB

Parsed with column specification: cols( .default = col_double() ) See spec(...) for full column specifications. File stored at: /var/folders/hq/pzgtdx7j55j0g7r4647vqzrr2yvxz9/T//RtmpdX3Qw9/GPL1825.soft GSE2193-GPL1826_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2193/matrix/GSE2193-GPL1826_series_matrix.txt.gz' Content type 'application/x-gzip' length 768530 bytes (750 KB)

downloaded 750 KB

Parsed with column specification: cols( 1 = col_double(), -.954 = col_double(), .104 = col_double(), -1.08 = col_double(), X5 = col_double(), -1.6 = col_double(), X7 = col_double(), -.14 = col_double(), -.256 = col_double(), .929 = col_double(), .205 = col_double(), -.939 = col_double() ) File stored at: /var/folders/hq/pzgtdx7j55j0g7r4647vqzrr2yvxz9/T//RtmpdX3Qw9/GPL1826.soft GSE2193-GPL1827_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2193/matrix/GSE2193-GPL1827_series_matrix.txt.gz' Content type 'application/x-gzip' length 563241 bytes (550 KB)

downloaded 550 KB

Parsed with column specification: cols( 1 = col_double(), -.226 = col_double(), .85 = col_double(), .239 = col_double(), .239_1 = col_double(), .239_2 = col_double(), .239_3 = col_double(), .239_4 = col_double(), .597 = col_double() ) File stored at: /var/folders/hq/pzgtdx7j55j0g7r4647vqzrr2yvxz9/T//RtmpdX3Qw9/GPL1827.soft Warning messages: 1: Missing column names filled in: 'X3' [3], 'X12' [12], 'X24' [24] 2: Missing column names filled in: 'X11' [11], 'X25' [25], 'X28' [28], 'X29' [29], 'X33' [33], 'X36' [36], 'X40' [40] 3: Missing column names filled in: 'X2' [2], 'X8' [8], 'X9' [9], 'X11' [11], 'X19' [19], 'X20' [20], 'X24' [24], 'X25' [25], 'X26' [26], 'X31' [31] 4: Missing column names filled in: 'X5' [5], 'X7' [7] 5: Duplicated column names deduplicated: '.239' => '.239_1' [5], '.239' => '.239_2' [6], '.239' => '.239_3' [7], '.239' => '.239_4' [8]

andyshaps commented 5 years ago

I reported this bug on bioconductor a few days ago and was going to report it here but i see no need now! (https://support.bioconductor.org/p/120905/

I did however find another series file which produced a similar parsing error. GSE9058

I don't know if this will help you fix the bug or not.

andyshaps commented 5 years ago

just to add, series GSE2503, GSE8525, GSE6887 and platform GPL6602 also result in a parsing error.

I think i have more that aren't parsing correctly but for now have put a hack-around. if these would help fix the bug then let me know and i will find out the accession numbers.

Andy

assaron commented 4 years ago

This dataset is also seems to be fixed by #101