thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
64 stars 11 forks source link

Issue with cache #76

Closed guidohooiveld closed 11 months ago

guidohooiveld commented 11 months ago

Hi, First attempt to use tximeta, but I cannot get it to work because of an issue with the cache. I tried to change/set the location of the AnnotationHub cache folder, but that didn't do the trick. Since I don't fully understand the instructions given in thetximeta vignette nor link given in the error, I would appreciate getting some hints.

TIA, Guido

> library(tximeta)
> library(AnnotationHub)
>
> ah <- AnnotationHub(cache = "/home/guidoh/AHcache/")
snapshotDate(): 2023-04-24

> se <- tximeta(
  coldata = coldata,
  type = "salmon",
  txOut = TRUE,
  skipMeta = FALSE,
  skipSeqinfo = FALSE,
  useHub = TRUE,
  markDuplicateTxps = FALSE,
  cleanDuplicateTxps = FALSE,
  customMetaInfo = NULL)

importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
found matching transcriptome:
[ GENCODE - Mus musculus - release M32 ]
useHub=TRUE: checking for TxDb via 'AnnotationHub'
Error in AnnotationHub() : 
  DEFUNCT: As of AnnotationHub (>2.23.2), default caching location has changed.
  Problematic cache: /home/guidoh/.cache/AnnotationHub
  See https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/TroubleshootingTheCache.html#default-caching-location-update
> 
> 
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 38 (Thirty Eight)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.5.0       AnnotationHub_3.8.0 BiocFileCache_2.8.0
[4] dbplyr_2.3.3        BiocGenerics_0.46.0 tximeta_1.18.1     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0              dplyr_1.1.2                  
 [3] blob_1.2.4                    filelock_1.0.2               
 [5] Biostrings_2.68.1             bitops_1.0-7                 
 [7] lazyeval_0.2.2                fastmap_1.1.1                
 [9] RCurl_1.98-1.12               GenomicAlignments_1.36.0     
[11] promises_1.2.1                XML_3.99-0.14                
[13] digest_0.6.33                 mime_0.12                    
[15] lifecycle_1.0.3               ProtGenerics_1.32.0          
[17] ellipsis_0.3.2                KEGGREST_1.40.0              
[19] interactiveDisplayBase_1.38.0 RSQLite_2.3.1                
[21] magrittr_2.0.3                compiler_4.3.1               
[23] rlang_1.1.1                   progress_1.2.2               
[25] tools_4.3.1                   utf8_1.2.3                   
[27] yaml_2.3.7                    rtracklayer_1.60.0           
[29] prettyunits_1.1.1             S4Arrays_1.0.5               
[31] bit_4.0.5                     curl_5.0.1                   
[33] DelayedArray_0.26.7           xml2_1.3.5                   
[35] abind_1.4-5                   BiocParallel_1.34.2          
[37] withr_2.5.0                   purrr_1.0.2                  
[39] grid_4.3.1                    stats4_4.3.1                 
[41] fansi_1.0.4                   xtable_1.8-4                 
[43] biomaRt_2.56.1                SummarizedExperiment_1.30.2  
[45] cli_3.6.1                     crayon_1.5.2                 
[47] generics_0.1.3                tzdb_0.4.0                   
[49] httr_1.4.6                    rjson_0.2.21                 
[51] DBI_1.1.3                     cachem_1.0.8                 
[53] zlibbioc_1.46.0               parallel_4.3.1               
[55] AnnotationDbi_1.62.2          AnnotationFilter_1.24.0      
[57] BiocManager_1.30.22           XVector_0.40.0               
[59] restfulr_0.0.15               matrixStats_1.0.0            
[61] vctrs_0.6.3                   Matrix_1.6-0                 
[63] jsonlite_1.8.7                IRanges_2.34.1               
[65] hms_1.1.3                     S4Vectors_0.38.1             
[67] bit64_4.0.5                   ensembldb_2.24.0             
[69] GenomicFeatures_1.52.1        glue_1.6.2                   
[71] codetools_0.2-19              stringi_1.7.12               
[73] BiocVersion_3.17.1            later_1.3.1                  
[75] GenomeInfoDb_1.36.1           GenomicRanges_1.52.0         
[77] BiocIO_1.10.0                 tibble_3.2.1                 
[79] pillar_1.9.0                  rappdirs_0.3.3               
[81] htmltools_0.5.6               GenomeInfoDbData_1.2.10      
[83] R6_2.5.1                      tximport_1.28.0              
[85] vroom_1.6.3                   lattice_0.21-8               
[87] shiny_1.7.4.1                 Biobase_2.60.0               
[89] readr_2.1.4                   png_0.1-8                    
[91] Rsamtools_2.16.0              memoise_2.0.1                
[93] httpuv_1.6.11                 Rcpp_1.0.11                  
[95] MatrixGenerics_1.12.3         pkgconfig_2.0.3              
> 
> 
mikelove commented 11 months ago

Because tximeta uses AnnotationHub when possible, you need to resolve this AHub issue first.

If you go to that URL, you can literally copy paste the code under 5.1, point 3 and it will fix the issue.

guidohooiveld commented 11 months ago

Thank you for your prompt reply!

Not knowing the inner works of AnnotationHub, I didn't realize a persistent cache remained after closing the R-session. Indeed, copy/pasting the code under 5.1 - point 3 moved files I didn't know were there, and also correctly changed the location of the cache.

Importing data now worked without any issue!

Thanks, G

For completeness:

First:

> moveFiles<-function(package){
        olddir <- path.expand(rappdirs::user_cache_dir(appname=package))
        newdir <- tools::R_user_dir(package, which="cache")
        dir.create(path=newdir, recursive=TRUE)
        files <- list.files(olddir, full.names =TRUE)
        moveres <- vapply(files,
        FUN=function(fl){
          filename = basename(fl)
          newname = file.path(newdir, filename)
          file.rename(fl, newname)
        },
        FUN.VALUE = logical(1))
        if(all(moveres)) unlink(olddir, recursive=TRUE)
    }

> package="AnnotationHub"
> moveFiles(package)

Then:

> se <- tximeta(
  coldata = coldata,
  type = "salmon",
  txOut = TRUE,
  skipMeta = FALSE,
  skipSeqinfo = FALSE,
  useHub = TRUE,
  markDuplicateTxps = FALSE,
  cleanDuplicateTxps = FALSE,
  customMetaInfo = NULL)

importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
found matching transcriptome:
[ GENCODE - Mus musculus - release M32 ]
useHub=TRUE: checking for TxDb via 'AnnotationHub'
  |======================================================================| 100%

snapshotDate(): 2023-04-24
did not find matching TxDb via 'AnnotationHub'
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.annotation.gtf.gz'
Content type 'unknown' length 29299972 bytes (27.9 MB)
==================================================
OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
generating transcript ranges
fetching genome info for GENCODE
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:utils':

    findMatches

The following objects are masked from 'package:base':

    expand.grid, I, unname

Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 132 out-of-bound ranges located on sequences
  chr4, chr8, chr13, chr14, and chr17. Note that ranges located on a
  sequence whose length is unknown (NA) or on a circular sequence are not
  considered out-of-bound (use seqlengths() and isCircular() to get the
  lengths and circularity flags of the underlying sequences). You can use
  trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more
  information.
>