grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
35 stars 13 forks source link

biomaRt error in converting mouse to hg gene orthologs #61

Open pallavisurana1 opened 2 years ago

pallavisurana1 commented 2 years ago

Hi, I have been trying to convert mouse genes to human orthologs. Please take a look at the code and error messages.

`

library(biomaRt)
musGenes <- c("Hmmr", "Tlx3", "Cpeb4")
mouse = useEnsembl("ensembl","mmusculus_gene_ensembl", mirror = "uswest")
human = useEnsembl("ensembl","hsapiens_gene_ensembl", mirror = "uswest")
genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = musGenes , mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T)

`

# ERROR messages Error in .createErrorMessage(error_code = status_code(res), host = host) : object 'err_msg' not found

`

sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16

biomaRt_2.48.3
loaded via a namespace (and not attached):
 [1] httr_1.4.2             vroom_1.5.7            bit64_4.0.5            foreach_1.5.2          assertthat_0.2.1      
 [6] BiocManager_1.30.16    stats4_4.1.0           BiocFileCache_2.0.0    blob_1.2.3             GenomeInfoDbData_1.2.6
[11] progress_1.2.2         pillar_1.7.0           RSQLite_2.2.12         glue_1.6.2             digest_0.6.29         
[16] RColorBrewer_1.1-3     XVector_0.32.0         colorspace_2.0-3       plyr_1.8.7             XML_3.99-0.9          
[21] pkgconfig_2.0.3        zlibbioc_1.38.0        purrr_0.3.4            xtable_1.8-4           scales_1.2.0          
[26] tzdb_0.3.0             tibble_3.1.6           KEGGREST_1.32.0        generics_0.1.2         IRanges_2.26.0        
[31] ggplot2_3.3.5          ellipsis_0.3.2         cachem_1.0.6           withr_2.5.0            cli_3.2.0             
[36] magrittr_2.0.3         crayon_1.5.1           memoise_2.0.1          fansi_1.0.3            doParallel_1.0.17     
[41] xml2_1.3.3             tools_4.1.0            prettyunits_1.1.1      hms_1.1.1              lifecycle_1.0.1       
[46] gridBase_0.4-7         stringr_1.4.0          S4Vectors_0.30.2       munsell_0.5.0          AnnotationDbi_1.54.1  
[51] Biostrings_2.60.2      compiler_4.1.0         GenomeInfoDb_1.28.4    tinytex_0.38           rlang_1.0.2           
[56] grid_4.1.0             RCurl_1.98-1.6         iterators_1.0.14       rstudioapi_0.13        rappdirs_0.3.3        
[61] bitops_1.0-7           gtable_0.3.0           codetools_0.2-18       DBI_1.1.2              curl_4.3.2            
[66] reshape2_1.4.4         R6_2.5.1               fastmap_1.1.0          bit_4.0.4              utf8_1.2.2            
[71] filelock_1.0.2         stringi_1.7.6          Rcpp_1.0.8.3           vctrs_0.4.1            png_0.1-7             
[76] dbplyr_2.1.1           tidyselect_1.1.2       xfun_0.30             
`
whywhowhat commented 2 years ago

does this help?

tt342400 commented 2 years ago

Others have solved this problem, please use this code:

human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl", host = "https://dec2021.archive.ensembl.org/") mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host = "https://dec2021.archive.ensembl.org/")

grimbough commented 2 years ago

There seems to be an issue with the link dataset queries after the latest update to Ensembl version 106. I've reported this to the Ensembl help desk, hopefully there will be a fix soon.

For now using the 105 archive as suggested by @tt342400 is the best workaround solution.

I'll report back here if I hear anything more from the Ensembl team.

ens-ds23 commented 2 years ago

Thank you for your reports. We are actively looking into this issue and can confirm we are seeing the same as you. The error message doesn't make much sense to us, though. I assume this is caused by an error in the error-handling code in biomaRt, so that the original cause is masked?

grimbough commented 2 years ago

Thank you for your reports. We are actively looking into this issue and can confirm we are seeing the same as you. The error message doesn't make much sense to us, though. I assume this is caused by an error in the error-handling code in biomaRt, so that the original cause is masked?

Thanks for looking into it. You're correct that the reported error message was a bug in the error handling code. Essential there was a case statement giving advice for known error messages, but I'd missed a default in the case of something unexpected. This has now been patched. The current error message now gives the HTTP response code in this situation:

Error: biomaRt has encountered an unknown server error. HTTP error code: 502
ens-ds23 commented 2 years ago

Thanks @grimbough . Creating a similar query in the biomart UI leads to an unexpected, but clearer presumed configuration error, which I'm following up with our biomart production people. I'm not 100% sure this is the same issue, but it is a genuine one and may be the cause. When we get to the bottom of that one we can see if it helps with this issue as well. I'll also do a binary search on our archives to see if these issues share the same set of biomart instances. Unfortunately, triaging based on logs is difficult (though possible) given the sheer load and the very high baseline error rate as people author and debug scripts: I'm hoping there will be an easier way through, but that's a fallback.

grimbough commented 2 years ago

Thanks @ens-ds23 for the update. If it's helpful I can strip away all the R package machinery and provide the underlying XML query that can be submitted with cURL at the command line. Let me know if that's of interest to you.

sofiapuvogelvittini commented 2 years ago

Hello, thanks for providing solutions. I tried tt342400 solution:human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl", host = "https://dec2021.archive.ensembl.org/") mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host = "https://dec2021.archive.ensembl.org/") however I recieved this error message Error in .getArchiveList(https, httr_config): argument "httr_config" is missing, with no default Do you know how can i fix it ? Thanks a lot, Sof'ia

ens-ds23 commented 2 years ago

@grimbough That XML query would be really useful if you could, thank you.

grimbough commented 2 years ago

@grimbough That XML query would be really useful if you could, thank you.

Sure. Here's an example of XML that should do an LDS query for the mouse MGI symbol mt-Co1 and return the matching human HGNC symbol:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" uniqueRows = "1" count = "0" datasetConfigVersion = "0.6" header="1" formatter = "TSV" requestid= "biomaRt"> 
  <Dataset name = "mmusculus_gene_ensembl">
    <Attribute name = "mgi_symbol"/>
    <Filter name = "mgi_symbol" value = "mt-Co1" />
  </Dataset>
  <Dataset name = "hsapiens_gene_ensembl" >
      <Attribute name = "hgnc_symbol"/>
   </Dataset>
</Query>

Submitting with wget gives:

wget -O /dev/stdout 'https://www.ensembl.org/biomart/martservice?query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName = "default" uniqueRows = "1" count = "0" datasetConfigVersion = "0.6" header="1" formatter = "TSV" requestid= "biomaRt"> <Dataset name = "mmusculus_gene_ensembl"><Attribute name = "mgi_symbol"/><Filter name = "mgi_symbol" value = "mt-Co1" /></Dataset><Dataset name = "hsapiens_gene_ensembl" ><Attribute name = "hgnc_symbol"/></Dataset></Query>'

--2022-05-09 20:41:42--  https://www.ensembl.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20virtualSchemaName%20=%20%22default%22%20uniqueRows%20=%20%221%22%20count%20=%20%220%22%20datasetConfigVersion%20=%20%220.6%22%20header=%221%22%20formatter%20=%20%22TSV%22%20requestid=%20%22biomaRt%22%3E%20%3CDataset%20name%20=%20%22mmusculus_gene_ensembl%22%3E%3CAttribute%20name%20=%20%22mgi_symbol%22/%3E%3CFilter%20name%20=%20%22mgi_symbol%22%20value%20=%20%22mt-Co1%22%20/%3E%3C/Dataset%3E%3CDataset%20name%20=%20%22hsapiens_gene_ensembl%22%20%3E%3CAttribute%20name%20=%20%22hgnc_symbol%22/%3E%3C/Dataset%3E%3C/Query%3E
Resolving www.ensembl.org (www.ensembl.org)... 193.62.193.83
Connecting to www.ensembl.org (www.ensembl.org)|193.62.193.83|:443... connected.
HTTP request sent, awaiting response... 500 Internal Server Error
2022-05-09 20:41:42 ERROR 500: Internal Server Error.

Just to check the XML isn't the culprit here's the same thing submitted to the 105 archive:

wget -q -O /dev/stdout 'https://dec2021.archive.ensembl.org/biomart/martservice?query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName = "default" uniqueRows = "1" count = "0" datasetConfigVersion = "0.6" header="1" formatter = "TSV" requestid= "biomaRt"> <Dataset name = "mmusculus_gene_ensembl"><Attribute name = "mgi_symbol"/><Filter name = "mgi_symbol" value = "mt-Co1" /></Dataset><Dataset name = "hsapiens_gene_ensembl" ><Attribute name = "hgnc_symbol"/></Dataset></Query>'

MGI symbol  HGNC symbol
mt-Co1  MT-CO1
pallavisurana1 commented 2 years ago

Thanks, I can use biomart now. I'll close this issue.

sofiapuvogelvittini commented 2 years ago

Hello, @PallaviSurana How did you solve it? I still can't use it

MertDemirdizen commented 2 years ago

I could not fix it too.

yujhao commented 2 years ago

IF the error messages: # ERROR messages Error in .createErrorMessage(error_code = status_code(res), host = host) : object 'err_msg' not found.

Please reload : human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl", host = "https://dec2021.archive.ensembl.org/"); mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host = "https://dec2021.archive.ensembl.org/"), I can use biomart now. @sofiapuvogelvittini @grimbough

andrewyatz commented 8 months ago

Hi all. Just dropping a line as we in Ensembl have had this issue re-raised. I am sorry we haven't been able to fix this and from the thread above it's obvious this is an issue at our end and not in biomaRt. Unfortunately our engineer who commented on the ticket previously was unable to resolve the issue and left EMBL last year. It's collectively on Ensembl's radar to see if there is an obvious regression in our build or configuration of BioMart but initial tests are not yielding much success.

If we do manage to make progress we will report it back here.

As for temporary solutions, using the 105 archive is the best course of action but obviously is now quite an old archive