thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
64 stars 11 forks source link

GENCODE - Mus musculus - release M28 not loading ok #65

Closed AMChalkie closed 1 year ago

AMChalkie commented 2 years ago

Hi,

Thanks for the very useful tool. I'm failing to get M28 to behave as expected.

Best wishes Alistair

se <- tximeta::tximeta(sample_information.df)

importing quantifications reading in files with read_tsv 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 found matching transcriptome: [ GENCODE - Mus musculus - release M28 ] useHub=TRUE: checking for TxDb via 'AnnotationHub' snapshotDate(): 2022-04-21 did not find matching TxDb via 'AnnotationHub' building TxDb with 'GenomicFeatures' package Import genomic features from the file as a GRanges object ... Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'download.file' for signature '"character"'

mikelove commented 2 years ago

hi, I can't reproduce -- I just quantified against M28 and imported:

> devtools::load_all("tximeta")
ℹ Loading tximeta
> coldata <- data.frame(files="sample/quant.sf",names="sample")
> se <- tximeta(coldata)
importing quantifications
reading in files with read_tsv
1 
found matching transcriptome:
[ GENCODE - Mus musculus - release M28 ]
useHub=TRUE: checking for TxDb via 'AnnotationHub'
  |======================================================================| 100%

snapshotDate(): 2021-10-20
did not find matching TxDb via 'AnnotationHub'
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz'
Content type 'unknown' length 28349951 bytes (27.0 MB)
==================================================
OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
generating transcript ranges
fetching genome info for GENCODE

Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
Calls: tximeta ... makeTxDbFromGFF -> makeTxDbFromGRanges -> .get_cds_IDX
2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 87 out-of-bound ranges located on sequences
  chr4, chr8, chr13, chr14, and chr17. Note that ranges located on a
  sequence whose length is unknown (NA) or on a circular sequence are not
  considered out-of-bound (use seqlengths() and isCircular() to get the
  lengths and circularity flags of the underlying sequences). You can use
  trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more
  information.
Calls: tximeta ... seqinfo<- -> seqinfo<- -> valid.GenomicRanges.seqinfo
> rowRanges(se)
GRanges object with 140790 ranges and 3 metadata columns:
                       seqnames          ranges strand |     tx_id
                          <Rle>       <IRanges>  <Rle> | <integer>
  ENSMUST00000193812.2     chr1 3143476-3144545      + |         1
  ENSMUST00000082908.3     chr1 3172239-3172348      + |         2
  ENSMUST00000162897.2     chr1 3276124-3286567      - |      4218
  ENSMUST00000159265.2     chr1 3276746-3285855      - |      4219
  ENSMUST00000070533.5     chr1 3284705-3741721      - |      4220
                   ...      ...             ...    ... .       ...
  ENSMUST00000082419.1     chrM     13552-14070      - |    142374
  ENSMUST00000082420.1     chrM     14071-14139      - |    142375
  ENSMUST00000082421.1     chrM     14145-15288      + |    142366
  ENSMUST00000082422.1     chrM     15289-15355      + |    142367
  ENSMUST00000082423.1     chrM     15356-15422      - |    142376
                                    gene_id              tx_name
                            <CharacterList>          <character>
  ENSMUST00000193812.2 ENSMUSG00000102693.2 ENSMUST00000193812.2
  ENSMUST00000082908.3 ENSMUSG00000064842.3 ENSMUST00000082908.3
  ENSMUST00000162897.2 ENSMUSG00000051951.6 ENSMUST00000162897.2
  ENSMUST00000159265.2 ENSMUSG00000051951.6 ENSMUST00000159265.2
  ENSMUST00000070533.5 ENSMUSG00000051951.6 ENSMUST00000070533.5
                   ...                  ...                  ...
  ENSMUST00000082419.1 ENSMUSG00000064368.1 ENSMUST00000082419.1
  ENSMUST00000082420.1 ENSMUSG00000064369.1 ENSMUST00000082420.1
  ENSMUST00000082421.1 ENSMUSG00000064370.1 ENSMUST00000082421.1
  ENSMUST00000082422.1 ENSMUSG00000064371.1 ENSMUST00000082422.1
  ENSMUST00000082423.1 ENSMUSG00000064372.1 ENSMUST00000082423.1
  -------
  seqinfo: 22 sequences (1 circular) from mm10 genome

Can you try on a different machine, or maybe try with latest version of R/Bioconductor?

AMChalkie commented 2 years ago

I will check another machine. In the meantime I get this warning when loading tximeta that looks related to the download error. And have included more debug info.

library(tximeta) Warning message: replacing previous import ‘utils::download.file’ by ‘restfulr::download.file’ when loading ‘rtracklayer’

coldata <- data.frame(files="JY1a/quant.sf",names="JY1a")

se <- tximeta(coldata) importing quantifications reading in files with read_tsv 1 found matching transcriptome: [ GENCODE - Mus musculus - release M28 ] useHub=TRUE: checking for TxDb via 'AnnotationHub' snapshotDate(): 2022-04-21 did not find matching TxDb via 'AnnotationHub' building TxDb with 'GenomicFeatures' package Import genomic features from the file as a GRanges object ... Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'download.file' for signature '"character"'

Traceback gives this:

11: stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) 10: (function (classes, fdef, mtable) { methods <- .findInheritedMethods(classes, fdef, mtable) if (length(methods) == 1L) return(methods[[1L]]) else if (length(methods) == 0L) { cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", collapse = ", ") stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) } else stop("Internal error in finding inherited methods; didn't return a unique method", domain = NA) })(list("character"), new("standardGeneric", .Data = function (url, destfile, method, quiet = FALSE, mode = "w", cacheOK = TRUE, extra = getOption("download.file.extra")) standardGeneric("download.file"), generic = structure("download.file", package = "restfulr"), package = "restfulr", group = list(), valueClass = character(0), signature = "url", default = NULL, skeleton = (function (url, destfile, method, quiet = FALSE, mode = "w", cacheOK = TRUE, extra = getOption("download.file.extra")) stop(gettextf("invalid call in method dispatch to '%s' (no default method)", "download.file"), domain = NA))(url, destfile, method, quiet, mode, cacheOK, extra)), ) 9: download.file(resource(con), destfile) 8: .local(con, format, text, ...) 7: import(FileForFormat(con, format), ...) 6: import(FileForFormat(con, format), ...) 5: import(file, format = format, colnames = colnames, feature.type = GFF_FEATURE_TYPES) 4: import(file, format = format, colnames = colnames, feature.type = GFF_FEATURE_TYPES) 3: makeTxDbFromGFF(txomeInfo$gtf) 2: getTxDb(txomeInfo, useHub = useHub) 1: tximeta(coldata)

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.3.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] GGally_2.1.2 ggrepel_0.9.1 plotly_4.10.0 tidyHeatmap_1.8.1 forcats_0.5.1
[6] stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[11] tibble_3.1.7 tidyverse_1.3.1 ggplot2_3.3.6 tidySummarizedExperiment_1.6.1 tidybulk_1.8.0
[16] tximport_1.23.4 data.table_1.14.2 SummarizedExperiment_1.26.1 Biobase_2.56.0 GenomicRanges_1.48.0
[21] GenomeInfoDb_1.32.2 IRanges_2.30.0 S4Vectors_0.34.0 BiocGenerics_0.42.0 MatrixGenerics_1.8.0
[26] matrixStats_0.62.0 tximeta_1.14.0

loaded via a namespace (and not attached): [1] readxl_1.4.0 backports_1.4.1 circlize_0.4.15 AnnotationHub_3.4.0 BiocFileCache_2.4.0
[6] plyr_1.8.7 lazyeval_0.2.2 BiocParallel_1.30.3 usethis_2.1.6 digest_0.6.29
[11] foreach_1.5.2 ensembldb_2.20.1 htmltools_0.5.2 viridis_0.6.2 fansi_1.0.3
[16] magrittr_2.0.3 memoise_2.0.1 cluster_2.1.3 doParallel_1.0.17 remotes_2.4.2
[21] tzdb_0.3.0 ComplexHeatmap_2.12.0 Biostrings_2.64.0 modelr_0.1.8 vroom_1.5.7
[26] prettyunits_1.1.1 colorspace_2.0-3 rvest_1.0.2 blob_1.2.3 rappdirs_0.3.3
[31] xfun_0.31 haven_2.5.0 callr_3.7.0 crayon_1.5.1 RCurl_1.98-1.7
[36] jsonlite_1.8.0 iterators_1.0.14 glue_1.6.2 gtable_0.3.0 zlibbioc_1.42.0
[41] XVector_0.36.0 GetoptLong_1.0.5 DelayedArray_0.22.0 pkgbuild_1.3.1 shape_1.4.6
[46] scales_1.2.0 DBI_1.1.2 Rcpp_1.0.8.3 viridisLite_0.4.0 xtable_1.8-4
[51] progress_1.2.2 clue_0.3-61 bit_4.0.4 preprocessCore_1.58.0 htmlwidgets_1.5.4
[56] httr_1.4.3 RColorBrewer_1.1-3 ellipsis_0.3.2 reshape_0.8.9 pkgconfig_2.0.3
[61] XML_3.99-0.10 dbplyr_2.2.0 utf8_1.2.2 tidyselect_1.1.2 rlang_1.0.2
[66] later_1.3.0 AnnotationDbi_1.58.0 cellranger_1.1.0 munsell_0.5.0 BiocVersion_3.15.2
[71] tools_4.2.0 cachem_1.0.6 cli_3.3.0 generics_0.1.2 RSQLite_2.2.14
[76] devtools_2.4.3 broom_0.8.0 evaluate_0.15 fastmap_1.1.0 yaml_2.3.5
[81] processx_3.6.0 knitr_1.39 fs_1.5.2 bit64_4.0.5 KEGGREST_1.36.2
[86] AnnotationFilter_1.20.0 dendextend_1.15.2 mime_0.12 xml2_1.3.3 biomaRt_2.52.0
[91] brio_1.1.3 compiler_4.2.0 rstudioapi_0.13 filelock_1.0.2 curl_4.3.2
[96] png_0.1-7 interactiveDisplayBase_1.34.0 testthat_3.1.4 reprex_2.0.1 stringi_1.7.6
[101] ps_1.7.0 desc_1.4.1 GenomicFeatures_1.48.3 lattice_0.20-45 ProtGenerics_1.28.0
[106] Matrix_1.4-1 vctrs_0.4.1 pillar_1.7.0 lifecycle_1.0.1 BiocManager_1.30.18
[111] GlobalOptions_0.1.2 bitops_1.0-7 httpuv_1.6.5 patchwork_1.1.1 rtracklayer_1.56.0
[116] R6_2.5.1 BiocIO_1.6.0 promises_1.2.0.1 gridExtra_2.3 sessioninfo_1.2.2
[121] codetools_0.2-18 pkgload_1.2.4 assertthat_0.2.1 rprojroot_2.0.3 rjson_0.2.21
[126] withr_2.5.0 GenomicAlignments_1.32.0 Rsamtools_2.12.0 GenomeInfoDbData_1.2.8 parallel_4.2.0
[131] hms_1.1.1 grid_4.2.0 rmarkdown_2.14 shiny_1.7.1 lubridate_1.8.0
[136] restfulr_0.0.14

mikelove commented 2 years ago

I think we can debug just within GenomicFeatures. This is the line causing trouble:

txdb <- makeTxDbFromGFF(txomeInfo$gtf)

where that first argument is equal to:

ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz

See if that also gives an error, maybe in a clean R session.

AMChalkie commented 2 years ago

Now we're getting somewhere

GenomicFeatures gives the warning

library(GenomicFeatures) Warning message: replacing previous import ‘utils::download.file’ by ‘restfulr::download.file’ when loading ‘rtracklayer’

txdb <- makeTxDbFromGFF("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz") # Import genomic features from the file as a GRanges object ... Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘download.file’ for signature ‘"character"’

AMChalkie commented 2 years ago

Same holds for dev version of bioconductor and latest GenomicFeatures.

library(GenomicFeatures)

txdb <- makeTxDbFromGFF("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz") Import genomic features from the file as a GRanges object ... Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘download.file’ for signature ‘"character"’

restfulr looks like the problem.

restfulr::download.file("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz") Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘download.file’ for signature ‘"character"’ download.file("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz") Error in download.file("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz") : argument "destfile" is missing, with no default download.file("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz",destfile="tmp.gtf") trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.annotation.gtf.gz' Content type 'unknown' length 28349951 bytes (27.0 MB)

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur/Monterey 10.16

Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] GenomicFeatures_1.49.5 AnnotationDbi_1.59.1 Biobase_2.57.1
[4] GenomicRanges_1.49.0 GenomeInfoDb_1.33.3 IRanges_2.31.0
[7] S4Vectors_0.35.1 BiocGenerics_0.43.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.8.3 lattice_0.20-45
[3] prettyunits_1.1.1 png_0.1-7
[5] Rsamtools_2.13.3 Biostrings_2.65.1
[7] assertthat_0.2.1 digest_0.6.29
[9] utf8_1.2.2 BiocFileCache_2.5.0
[11] R6_2.5.1 RSQLite_2.2.14
[13] httr_1.4.3 pillar_1.7.0
[15] zlibbioc_1.43.0 rlang_1.0.2
[17] progress_1.2.2 curl_4.3.2
[19] blob_1.2.3 Matrix_1.4-1
[21] BiocParallel_1.31.8 stringr_1.4.0
[23] RCurl_1.98-1.7 bit_4.0.4
[25] biomaRt_2.53.2 DelayedArray_0.23.0
[27] compiler_4.2.0 rtracklayer_1.57.0
[29] pkgconfig_2.0.3 SummarizedExperiment_1.27.1 [31] tidyselect_1.1.2 KEGGREST_1.37.2
[33] tibble_3.1.7 GenomeInfoDbData_1.2.8
[35] matrixStats_0.62.0 codetools_0.2-18
[37] XML_3.99-0.10 fansi_1.0.3
[39] crayon_1.5.1 dplyr_1.0.9
[41] dbplyr_2.2.0 GenomicAlignments_1.33.0
[43] bitops_1.0-7 rappdirs_0.3.3
[45] grid_4.2.0 lifecycle_1.0.1
[47] DBI_1.1.2 magrittr_2.0.3
[49] cli_3.3.0 stringi_1.7.6
[51] cachem_1.0.6 XVector_0.37.0
[53] xml2_1.3.3 ellipsis_0.3.2
[55] filelock_1.0.2 generics_0.1.2
[57] vctrs_0.4.1 rjson_0.2.21
[59] restfulr_0.0.14 tools_4.2.0
[61] bit64_4.0.5 glue_1.6.2
[63] purrr_0.3.4 MatrixGenerics_1.9.0
[65] hms_1.1.1 parallel_4.2.0
[67] fastmap_1.1.0 yaml_2.3.5
[69] BiocManager_1.30.18 memoise_2.0.1
[71] BiocIO_1.7.1

mikelove commented 2 years ago

Then if a core package won’t import a standard GTF file you can post to support site, but first you’d want to make sure you have a valid Bioc installation.

BiocManager::valid()

AMChalkie commented 2 years ago

BiocManager::valid() [1] TRUE

I'll report that.

mikelove commented 2 years ago

Thanks for posting and following up on the bug BTW, hope we can squash it. When I tested earlier today was against a mixed installation of an older release with devel tximeta. So that may explain why it worked for me…?

AMChalkie commented 2 years ago

No worries, seems specific and detailed enough.

mikelove commented 2 years ago

In the meantime you can use skipMeta=TRUE and it just won’t attach GRanges.

maximilian-heeg commented 2 years ago

I did have the exact same issue. Downgrading restfulr to version 0.0.13 (from 0.0.14, see difference here) did resolve the problem for me.

mikelove commented 2 years ago

Thanks for reporting, could you also post to the GenomicFeatures thread, as this is related to core functionality.

mikelove commented 1 year ago

Think this has been resolved upstream, in GenomicFeatures