grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

error with getSequence #18

Open SalvatoreRa opened 5 years ago

SalvatoreRa commented 5 years ago

Hi,

I am using getSequence to retrieve upstream sequence of a gene. I followed the example that I found in the vignette but it did not work.

library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
entrez=c("673","7157","837")
getSequence(id = entrez, 
            type="entrezgene",
            seqType="coding_gene_flank",
            upstream=100, 
            mart=ensembl) 

I tried the following code and that the first time worked:

library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
x <- getSequence(id = "BRCA1", 
                type = "hgnc_symbol",
                seqType="gene_flank",
                upstream=100, 
                mart=ensembl) 

the problem is that is not working anymore. If I clean the global environment and test again sometimes works sometimes not. Most of the time I just get the following error:

Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT FOUND

grimbough commented 5 years ago

I think this is the same issue reported in https://support.bioconductor.org/p/120429/

Ensembl have closed my support ticket, so perhaps it is fixed?

SalvatoreRa commented 5 years ago

I am not so sure, I found an alternative solution:


mart <- useEnsembl(biomart = "ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "useast") #changed mirror site, because otherwise it did not work
gene_list <- getBM(attributes=c('affy_hugene_1_0_st_v1', 'hgnc_symbol'), 
                   filters = 'affy_hugene_1_0_st_v1', 
                   values = gene_list, 
                   mart = mart)
junli1988 commented 4 years ago

I'm having a similar problem with both the getSequence and getGene funcgtions of this package.

SalvatoreRa commented 4 years ago

I'm having a similar problem with both the getSequence and getGene funcgtions of this package.

I do not know in your case, my solution for me is working (changing the mirror site)

junli1988 commented 4 years ago

I'm having a similar problem with both the getSequence and getGene funcgtions of this package.

I do not know in your case, my solution for me is working (changing the mirror site)

Is there a list of potential mirrors that I try changing to?

SalvatoreRa commented 4 years ago

https://m.ensembl.org/info/about/mirrors.html

The valid options are ’www’, ’uswest’, ’useast’, ’asia’.

yichao-cai commented 2 months ago

Hi I am having the same issue getting the sequence. Your help are highly appreciated @grimbough . Thanks a lot!

  1. Loading the mart: (I tried playing with the mirror and host option but no luck)
    mart <- useEnsembl(biomart = "ensembl", dataset = "hsapiens_gene_ensembl",
                   # version="112",
                   # host = "https://asia.ensembl.org",
                   # host = "https://www.ensembl.org",
                   mirror = "asia",
                   verbose=TRUE)

mart information:

Formal class 'Mart' [package "biomaRt"] with 8 slots
  ..@ biomart    : chr "ENSEMBL_MART_ENSEMBL"
  ..@ host       : chr "https://www.ensembl.org:443/biomart/martservice?redirect=no"
  ..@ vschema    : chr "default"
  ..@ version    : chr ""
  ..@ dataset    : chr "hsapiens_gene_ensembl"
  ..@ filters    :'data.frame': 435 obs. of  9 variables:
  1. Querying successful without flanking option:

    biomaRt::getSequence(id=c("ENSG00000205571"), type="ensembl_gene_id",
                     seqType="gene_exon",
                     mart=mart) %>% tibble

    Output:

    # A tibble: 36 × 2
    gene_exon                                                                                                                          ensembl_gene_id
    <chr>                                                                                                                              <chr>          
    1 TCTGTGAAGTAGCTAATAATATAGAACAAAATGCTCAAGAG                                                                                          ENSG00000205571
    2 ATAATTCCCCCACCACCTCCCATATGTCCAGATTCTCTTGATGATGCTGATGCTTTGGGAAGTATGTTAATTTCATGGTACATGAGTGGCTATCATACTGGCTATTATATGGTAAGTAATCACTCAGCA… ENSG00000205571
    3 AGTCTCGCTCTGCTGCCCACGCTGGAGTGCAGTGGTGCAATCTCAGCTCACTGCAACCTCTGCTATCCGGGTTCAAGCAGTTCTCGTGCCTCACCCACGTGAGTAGTTGGGATTACAGGCATGTGGCAC… ENSG00000205571
    4 CATGCTCTAAAGAATGGTGACATTTGTGAAACTTCGGGTAAACCAAAAACCACACCTAAAAGAAAACCTGCTAAGAAGAATAAAAGCCAAAAGAAGAATACTGCAGCTTCCTTACAACAG           ENSG00000205571
    5 ATGGCGATGAGCAGCGGCGGCAGTGGTGGCGGCGTCCCGGAGCAGGAGGATTCCGTGCTGTTCCGGCGCGGCACAGGCCAG                                                  ENSG00000205571
    6 GAAATGCTGGCATAGAGCAGCACTAAATGACACCACTAAAGAAACGATCAGACAGATCTGGAATGTGAAGCGTTATAGAAGATAACTGGCCTCATTTCTTCAAAATATCAAGTGTTGGGAAAGAAAAAA… ENSG00000205571
    7 AGCCAGGTCTAAAATTCAATGGCCCACCACCGCCACCGCCACCACCACCACCCCACTTACTATCATGCTGGCTGCCTCCATTTCCTTCTGGACCACCA                                 ENSG00000205571
    8 ATAATTCCCCCACCACCTCCCATATGTCCAGATTCTCTTGATGATGCTGATGCTTTGGGAAGTATGTTAATTTCATGGTACATGAGTGGCTATCATACTGGCTATTATATGGTAA                ENSG00000205571
    9 CCAGGTCTAAAATTCAATGGCCCACCACCGCCACCGCCACCACCACCACCCCACTTACTATCATGCTGGCTGCCTCCATTTCCTTCTGGACCACCA                                   ENSG00000205571
    10 GGTTTTAGACAAAATCAAAAAGAAGGAAGGTGCTCACATTCCTTAAATTAAGGA                                                                             ENSG00000205571
    # ℹ 26 more rows
    # ℹ Use `print(n = ...)` to see more rows
  2. Failed query with flanking option:

    biomaRt::getSequence(id=c("ENSG00000205571"), type="ensembl_gene_id",
                     seqType="gene_exon",
                     downstream=5,
                     mart=mart) %>% tibble

    Error:

    Error in .processResults(postRes, mart = mart, hostURLsep = sep, fullXmlQuery = fullXmlQuery,  : 
    Query ERROR: caught BioMart::Exception::Usage: Filter downstream_flank NOT FOUND
  3. sessionInfo:

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0
[10] biomaRt_2.61.0  purrr_1.0.2     furrr_0.3.1     future_1.33.2   UpSetR_1.4.0    here_1.0.1     

loaded via a namespace (and not attached):
 [1] KEGGREST_1.42.0         gtable_0.3.5            httr2_1.0.1             Biobase_2.62.0          tzdb_0.4.0              vctrs_0.6.5            
 [7] tools_4.3.1             bitops_1.0-7            generics_0.1.3          parallel_4.3.1          stats4_4.3.1            curl_5.2.1             
[13] fansi_1.0.6             AnnotationDbi_1.64.1    RSQLite_2.3.7           blob_1.2.4              pkgconfig_2.0.3         dbplyr_2.5.0           
[19] S4Vectors_0.40.2        lifecycle_1.0.4         GenomeInfoDbData_1.2.11 compiler_4.3.1          Biostrings_2.70.3       progress_1.2.3         
[25] munsell_0.5.1           codetools_0.2-19        GenomeInfoDb_1.38.8     RCurl_1.98-1.14         pillar_1.9.0            crayon_1.5.2           
[31] cachem_1.1.0            parallelly_1.37.1       tidyselect_1.2.1        digest_0.6.35           stringi_1.8.4           listenv_0.9.1          
[37] rprojroot_2.0.4         fastmap_1.2.0           grid_4.3.1              colorspace_2.1-0        cli_3.6.2               magrittr_2.0.3         
[43] utf8_1.2.4              withr_3.0.0             prettyunits_1.2.0       filelock_1.0.3          scales_1.3.0            rappdirs_0.3.3         
[49] bit64_4.0.5             timechange_0.3.0        XVector_0.42.0          httr_1.4.7              globals_0.16.3          bit_4.0.5              
[55] gridExtra_2.3           png_0.1-8               hms_1.1.3               memoise_2.0.1           IRanges_2.36.0          BiocFileCache_2.10.2   
[61] rlang_1.1.4             Rcpp_1.0.12             glue_1.7.0              DBI_1.2.3               xml2_1.3.6              BiocGenerics_0.48.1    
[67] rstudioapi_0.16.0       R6_2.5.1                plyr_1.8.9              zlibbioc_1.48.2