Issue with cache - Githubissues

Hi, First attempt to use tximeta, but I cannot get it to work because of an issue with the cache. I tried to change/set the location of the AnnotationHub cache folder, but that didn't do the trick. Since I don't fully understand the instructions given in thetximeta vignette nor link given in the error, I would appreciate getting some hints.

TIA, Guido

> library(tximeta)
> library(AnnotationHub)
>
> ah <- AnnotationHub(cache = "/home/guidoh/AHcache/")
snapshotDate(): 2023-04-24

> se <- tximeta(
  coldata = coldata,
  type = "salmon",
  txOut = TRUE,
  skipMeta = FALSE,
  skipSeqinfo = FALSE,
  useHub = TRUE,
  markDuplicateTxps = FALSE,
  cleanDuplicateTxps = FALSE,
  customMetaInfo = NULL)

importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
found matching transcriptome:
[ GENCODE - Mus musculus - release M32 ]
useHub=TRUE: checking for TxDb via 'AnnotationHub'
Error in AnnotationHub() : 
  DEFUNCT: As of AnnotationHub (>2.23.2), default caching location has changed.
  Problematic cache: /home/guidoh/.cache/AnnotationHub
  See https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/TroubleshootingTheCache.html#default-caching-location-update
> 
> 
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 38 (Thirty Eight)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.5.0       AnnotationHub_3.8.0 BiocFileCache_2.8.0
[4] dbplyr_2.3.3        BiocGenerics_0.46.0 tximeta_1.18.1     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0              dplyr_1.1.2                  
 [3] blob_1.2.4                    filelock_1.0.2               
 [5] Biostrings_2.68.1             bitops_1.0-7                 
 [7] lazyeval_0.2.2                fastmap_1.1.1                
 [9] RCurl_1.98-1.12               GenomicAlignments_1.36.0     
[11] promises_1.2.1                XML_3.99-0.14                
[13] digest_0.6.33                 mime_0.12                    
[15] lifecycle_1.0.3               ProtGenerics_1.32.0          
[17] ellipsis_0.3.2                KEGGREST_1.40.0              
[19] interactiveDisplayBase_1.38.0 RSQLite_2.3.1                
[21] magrittr_2.0.3                compiler_4.3.1               
[23] rlang_1.1.1                   progress_1.2.2               
[25] tools_4.3.1                   utf8_1.2.3                   
[27] yaml_2.3.7                    rtracklayer_1.60.0           
[29] prettyunits_1.1.1             S4Arrays_1.0.5               
[31] bit_4.0.5                     curl_5.0.1                   
[33] DelayedArray_0.26.7           xml2_1.3.5                   
[35] abind_1.4-5                   BiocParallel_1.34.2          
[37] withr_2.5.0                   purrr_1.0.2                  
[39] grid_4.3.1                    stats4_4.3.1                 
[41] fansi_1.0.4                   xtable_1.8-4                 
[43] biomaRt_2.56.1                SummarizedExperiment_1.30.2  
[45] cli_3.6.1                     crayon_1.5.2                 
[47] generics_0.1.3                tzdb_0.4.0                   
[49] httr_1.4.6                    rjson_0.2.21                 
[51] DBI_1.1.3                     cachem_1.0.8                 
[53] zlibbioc_1.46.0               parallel_4.3.1               
[55] AnnotationDbi_1.62.2          AnnotationFilter_1.24.0      
[57] BiocManager_1.30.22           XVector_0.40.0               
[59] restfulr_0.0.15               matrixStats_1.0.0            
[61] vctrs_0.6.3                   Matrix_1.6-0                 
[63] jsonlite_1.8.7                IRanges_2.34.1               
[65] hms_1.1.3                     S4Vectors_0.38.1             
[67] bit64_4.0.5                   ensembldb_2.24.0             
[69] GenomicFeatures_1.52.1        glue_1.6.2                   
[71] codetools_0.2-19              stringi_1.7.12               
[73] BiocVersion_3.17.1            later_1.3.1                  
[75] GenomeInfoDb_1.36.1           GenomicRanges_1.52.0         
[77] BiocIO_1.10.0                 tibble_3.2.1                 
[79] pillar_1.9.0                  rappdirs_0.3.3               
[81] htmltools_0.5.6               GenomeInfoDbData_1.2.10      
[83] R6_2.5.1                      tximport_1.28.0              
[85] vroom_1.6.3                   lattice_0.21-8               
[87] shiny_1.7.4.1                 Biobase_2.60.0               
[89] readr_2.1.4                   png_0.1-8                    
[91] Rsamtools_2.16.0              memoise_2.0.1                
[93] httpuv_1.6.11                 Rcpp_1.0.11                  
[95] MatrixGenerics_1.12.3         pkgconfig_2.0.3              
> 
>

Thank you for your prompt reply!

Not knowing the inner works of AnnotationHub, I didn't realize a persistent cache remained after closing the R-session. Indeed, copy/pasting the code under 5.1 - point 3 moved files I didn't know were there, and also correctly changed the location of the cache.

Importing data now worked without any issue!

Thanks, G

For completeness:

First:

> moveFiles<-function(package){
        olddir <- path.expand(rappdirs::user_cache_dir(appname=package))
        newdir <- tools::R_user_dir(package, which="cache")
        dir.create(path=newdir, recursive=TRUE)
        files <- list.files(olddir, full.names =TRUE)
        moveres <- vapply(files,
        FUN=function(fl){
          filename = basename(fl)
          newname = file.path(newdir, filename)
          file.rename(fl, newname)
        },
        FUN.VALUE = logical(1))
        if(all(moveres)) unlink(olddir, recursive=TRUE)
    }

> package="AnnotationHub"
> moveFiles(package)

Then:

> se <- tximeta(
  coldata = coldata,
  type = "salmon",
  txOut = TRUE,
  skipMeta = FALSE,
  skipSeqinfo = FALSE,
  useHub = TRUE,
  markDuplicateTxps = FALSE,
  cleanDuplicateTxps = FALSE,
  customMetaInfo = NULL)

importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
found matching transcriptome:
[ GENCODE - Mus musculus - release M32 ]
useHub=TRUE: checking for TxDb via 'AnnotationHub'
  |======================================================================| 100%

snapshotDate(): 2023-04-24
did not find matching TxDb via 'AnnotationHub'
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.annotation.gtf.gz'
Content type 'unknown' length 29299972 bytes (27.9 MB)
==================================================
OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
generating transcript ranges
fetching genome info for GENCODE
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:utils':

    findMatches

The following objects are masked from 'package:base':

    expand.grid, I, unname

Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 132 out-of-bound ranges located on sequences
  chr4, chr8, chr13, chr14, and chr17. Note that ranges located on a
  sequence whose length is unknown (NA) or on a circular sequence are not
  considered out-of-bound (use seqlengths() and isCircular() to get the
  lengths and circularity flags of the underlying sequences). You can use
  trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more
  information.
>

thelovelab / tximeta

Issue with cache #76