grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

Error with getBM() - no such table : metadata #24

Open Manaswwm opened 3 years ago

Manaswwm commented 3 years ago

Hi, I have been using biomaRt for a while and it has been running fine, however, since yesterday I have been getting an error that I have not encountered before. First, let me show you the code that I use to retrieve the data

library(biomaRt)

 thaliana_mart = useMart(host="plants.ensembl.org", "plants_mart",
                        dataset = "athaliana_eg_gene")
chr1_geneids = getBM(attributes = "ensembl_gene_id", filters = "chromosome_name",
                           values = "1", mart= thaliana_mart)`

When I try to run the above code, as an example, I get the following error: Error in result_create(conn@ptr, statement) : no such table: metadata

This is happening for any retrieval using getBM(). Can you please help me?

Also, for your reference: here is the sessionInfo():


""R version 3.6.3 (2020-02-29)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /home/mjoshi/.conda/envs/myLib/lib/libopenblasp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocManager_1.30.10 biomaRt_2.42.0     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3           compiler_3.6.3       pillar_1.4.3         dbplyr_1.4.2         prettyunits_1.1.1   
 [6] tools_3.6.3          progress_1.2.2       zeallot_0.1.0        digest_0.6.23        bit_1.1-14          
[11] RSQLite_2.1.1        memoise_1.1.0        BiocFileCache_1.10.0 tibble_2.1.3         pkgconfig_2.0.3     
[16] rlang_0.4.2          DBI_1.1.0            rstudioapi_0.10      curl_4.3             parallel_3.6.3      
[21] stringr_1.4.0        httr_1.4.1           dplyr_0.8.3          rappdirs_0.3.1       S4Vectors_0.24.1    
[26] vctrs_0.2.1          askpass_1.1          IRanges_2.20.1       hms_0.5.3            tidyselect_0.2.5    
[31] stats4_3.6.3         bit64_0.9-7          glue_1.3.1           Biobase_2.46.0       R6_2.4.1            
[36] AnnotationDbi_1.48.0 XML_3.99-0.3         purrr_0.3.3          blob_1.1.1           magrittr_1.5        
[41] backports_1.1.5      BiocGenerics_0.32.0  assertthat_0.2.1     stringi_1.4.5        openssl_1.4.1       
[46] crayon_1.3.4""
xinyixinyijiang commented 3 years ago

Did you figure it out? I have the same question now....

grimbough commented 3 years ago

@VivianJiangxinyi can you provide the code you're running that produces the error, as well as the output of sessionInfo() ?

xinyixinyijiang commented 3 years ago

@grimbough Sure. Here is the sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Scientific Linux 7.5 (Nitrogen)

Matrix products: default BLAS/LAPACK: /gpfs/igmmfs01/eddie/QTLgroup/PERSONAL/Xinyi/PhDproject/anaconda/envs/phdproj/lib/libopenblasp-r0.3.10.so

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] biomaRt_2.46.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 pillar_1.4.6 compiler_4.0.3 [4] dbplyr_2.0.0 prettyunits_1.1.1 tools_4.0.3 [7] progress_1.2.2 digest_0.6.27 bit_4.0.4 [10] tibble_3.0.4 RSQLite_2.2.1 memoise_1.1.0 [13] BiocFileCache_1.14.0 lifecycle_0.2.0 pkgconfig_2.0.3 [16] rlang_0.4.8 DBI_1.1.0 curl_4.3 [19] parallel_4.0.3 stringr_1.4.0 httr_1.4.2 [22] dplyr_1.0.2 xml2_1.3.2 rappdirs_0.3.1 [25] generics_0.1.0 S4Vectors_0.28.0 vctrs_0.3.4 [28] askpass_1.1 IRanges_2.24.0 hms_0.5.3 [31] tidyselect_1.1.0 stats4_4.0.3 bit64_4.0.5 [34] glue_1.4.2 Biobase_2.50.0 R6_2.5.0 [37] AnnotationDbi_1.52.0 XML_3.99-0.5 purrr_0.3.4 [40] blob_1.2.1 magrittr_1.5 ellipsis_0.3.1 [43] BiocGenerics_0.36.0 assertthat_0.2.1 stringi_1.5.3 [46] openssl_1.4.3 crayon_1.3.4

Here is my code:

library("biomaRt")
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes = c("affy_hg_u95av2", "hgnc_symbol", "chromosome_name", "band"), filters = "affy_hg_u95av2", values = c("1939_at","1503_at","1454_at"), mart = mart)
Error: no such table: metadata
grimbough commented 3 years ago

Thanks. You code works for me suggesting it's either an intermittent thing with the Ensembl server or something specific to your R setup. I don't recognise the error message from biomaRt, and it doesn't look like any server-side BioMart message I've seen, so I wonder if it's from one of the packages biomaRt depends on.

Can you try running the getBM() command with the argument useCache = FALSE e.g.

getBM(attributes = c("affy_hg_u95av2", "hgnc_symbol", "chromosome_name", "band"), 
      filters = "affy_hg_u95av2",
      values = c("1939_at","1503_at","1454_at"), 
      mart = mart, 
      useCache = FALSE)
Manaswwm commented 3 years ago

This was still a problem for me, I avoided using biomaRt and instead obtained the required information using REST API feature of Ensembl (and Uniprot).

However, I just tried the trick suggested by @grimbough and included useCache = FALSE in the getBM command and this seems to fix the problem. Thanks! So is it that cache is causing the error during retrieval? If so then when I try to do biomartClearCache() , this does not seem to work and throws the same error again.

grimbough commented 3 years ago

Good to know this has some impact. biomaRt uses BiocFileCache to store the results of any query, and if it detects you're running a query it already has results for it will just load directly from the cache. Hopefully that saves time, bandwidth, and server load.

If you're getting the error the first time you run a query (or straight after running biomartClearCache()) then the problem is actually with adding the result to the cache, rather than retrieval.

I presume this error is coming from somewhere inside BiocFileCache, as biomaRt never explicitly references 'metadata' anywhere in its code. I've not experienced it on any of my test platforms, so I'm a bit lost as to what would be causing it. I notice both reports here are from systems where R is installed via conda, but I don't have a good reason why that would be a problem.

If you don't mind helping me try to understand the issue, the code below will create a new cache in a temporary location, and then try to add an entry to it. It'd be great to know if this also throws the error.

cache <- file.path(tempdir(), "biomart_cache_test")
bfc <- BiocFileCache::BiocFileCache(cache, ask = FALSE)
biomaRt:::.addToCache(bfc, result = 1:10, hash = sample(1:10000, 1))
Manaswwm commented 3 years ago

I tried doing the above, however for biomaRt::: I only have the option of .checkCache() and not .addToCache(). Do you think it could be because I have 2.42.0 version of biomaRt?

aeaswar81 commented 3 years ago

I faced a similar issue. In my case none of the biomart functions worked and all threw the same error. Error: no such table: metadata Solved it by deleting the hidden .cache folder in the working directory. There was a biomaRt folder inside that. So I believe deleting the biomaRt folder or the contents inside that would do the job.