thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
64 stars 11 forks source link

addExons throws error: in bfcrpath(bfc, txdbName) : not all 'rnames' found or unique. #55

Closed matmu closed 3 years ago

matmu commented 3 years ago

Since a few days, I am getting an error when adding the exons. I couldn't find out yet were it comes from.

se = tximeta(df, type = "salmon")
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10
found matching transcriptome:
[ GENCODE - Homo sapiens - release 35 ]
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gtf.gz'
Content type 'unknown' length 43803765 bytes (41.8 MB)
==================================================
OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
loading existing transcript ranges created: 2021-02-04 12:28:21
fetching genome info for GENCODE
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
se.exons <- addExons(se)
Error in (function (x)  : attempt to apply non-function
Error in bfcrpath(bfc, txdbName) : not all 'rnames' found or unique.
Calls: addExons -> getTxDb -> bfcrpath -> bfcrpath
In addition: Warning messages:
1: call dbDisconnect() when finished working with a connection
2: In FUN(X[[i]], ...) : 'rnames' exact pattern
    'gencode.v35.annotation.gtf.gz'
  is not unique; use 'bfcquery()' to see matches.
Execution halted
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /opt/conda/envs/rnaseq/lib/libopenblasp-r0.3.10.so

locale:
[1] C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] GenomicFeatures_1.40.0      AnnotationDbi_1.50.0
 [3] SummarizedExperiment_1.18.1 DelayedArray_0.14.0
 [5] matrixStats_0.57.0          Biobase_2.48.0
 [7] GenomicRanges_1.40.0        GenomeInfoDb_1.24.0
 [9] IRanges_2.22.1              S4Vectors_0.26.0
[11] BiocGenerics_0.34.0         tximeta_1.6.3
[13] dplyr_1.0.2

loaded via a namespace (and not attached):
 [1] httr_1.4.2                    bit64_4.0.5
 [3] jsonlite_1.7.1                AnnotationHub_2.20.0
 [5] shiny_1.5.0                   assertthat_0.2.1
 [7] interactiveDisplayBase_1.26.0 askpass_1.1
 [9] BiocManager_1.30.10           BiocFileCache_1.12.0
[11] blob_1.2.1                    GenomeInfoDbData_1.2.4
[13] Rsamtools_2.4.0               yaml_2.2.1
[15] progress_1.2.2                BiocVersion_3.11.1
[17] pillar_1.4.7                  RSQLite_2.2.1
[19] lattice_0.20-41               glue_1.4.2
[21] digest_0.6.27                 promises_1.1.1
[23] XVector_0.28.0                htmltools_0.5.0
[25] httpuv_1.5.4                  Matrix_1.2-18
[27] XML_3.99-0.3                  pkgconfig_2.0.3
[29] biomaRt_2.44.0                zlibbioc_1.34.0
[31] purrr_0.3.4                   xtable_1.8-4
[33] later_1.1.0.1                 BiocParallel_1.22.0
[35] tibble_3.0.4                  openssl_1.4.3
[37] generics_0.1.0                AnnotationFilter_1.12.0
[39] ellipsis_0.3.1                withr_2.3.0
[41] lazyeval_0.2.2                magrittr_2.0.1
[43] crayon_1.3.4                  mime_0.9
[45] memoise_1.1.0                 tools_4.0.2
[47] prettyunits_1.1.1             hms_0.5.3
[49] lifecycle_0.2.0               stringr_1.4.0
[51] ensembldb_2.12.1              Biostrings_2.56.0
[53] compiler_4.0.2                rlang_0.4.9
[55] grid_4.0.2                    RCurl_1.98-1.2
[57] tximport_1.16.0               rappdirs_0.3.1
[59] bitops_1.0-6                  DBI_1.1.0
[61] curl_4.3                      R6_2.5.0
[63] GenomicAlignments_1.24.0      rtracklayer_1.48.0
[65] fastmap_1.0.1                 bit_4.0.4
[67] ProtGenerics_1.20.0           readr_1.4.0
[69] stringi_1.4.6                 Rcpp_1.0.5
[71] vctrs_0.3.5                   dbplyr_2.0.0
[73] tidyselect_1.1.0
mikelove commented 3 years ago

This isn't an issue specifically for addExons, but instead somehow two or more entries were written to the cache that have the same identifier. The solution is easy, to just remove the duplicate entries with BiocFileCache.

You can do:

library(BiocFileCache)
bfc <- BiocFileCache()
# then
x <- bfcinfo(bfc)
x$rname
x$rid[x$rname == "gencode.v35.annotation.gtf.gz"]
# then remove the duplicates (or all which will trigger re-download)
bfcremove(bfc, rids=...)
matmu commented 3 years ago

Thanks @mikelove

Removing the cache with rm -rf ~/.cache/tximeta/* also worked.