grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

Catching "line X did not have Y elements" #16

Closed grimbough closed 3 years ago

grimbough commented 5 years ago

Example: https://support.bioconductor.org/p/121356/

Encountered several times when returned values contain non-escaped \n. This is interpreted as a new row and breaks the table reading.

host = "https://www.ensembl.org:443/biomart/martservice?"
query = "<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query>
<Query  virtualSchemaName = 'default' uniqueRows = '1' count = '0' datasetConfigVersion = '0.6' header='1' requestid= 'biomaRt' formatter = 'TSV'>
<Dataset name = 'hsapiens_gene_ensembl'><Attribute name = 'ensembl_gene_id'/>
<Attribute name = 'hgnc_symbol'/>
<Attribute name = 'go_id'/>
<Attribute name = 'name_1006'/>
<Attribute name = 'definition_1006'/>
<Filter name = \"ensembl_gene_id\" value = \"ENSG00000100036\" /></Dataset></Query>"
res <- httr::POST(url = host, body = list(query = query), 
                  httr::set_cookies(.cookies = c(redirect_mirror = "no")))
con = textConnection(httr::content(res))
result = read.table(con, sep=",", header=TRUE, quote = "\"", comment.char = "", as.is=TRUE, check.names = TRUE, allowEscapes = TRUE)

Solution: The HTML version of the output seems to handle this correctly. This is larger and more complex to parse, but could be used as a backup if TSV is failing.

grimbough commented 3 years ago

This is now caught in https://github.com/grimbough/biomaRt/blob/a34e372b2c9ab74095b9ae65f9ca00d8f8ad1a7c/R/utilityFunctions.R#L244 and handled by the internal function .fetchHTMLresults()