jorainer / ensembldb

This is the ensembldb development repository.
https://jorainer.github.io/ensembldb
33 stars 10 forks source link

proteinToTranscript not working for leaderless transcripts #119

Closed elstondsouza closed 3 years ago

elstondsouza commented 3 years ago

Been looking into mapping protein pfam domains to transcripts.

Been running into this issue where if a certain transcript doesn't have a 5' UTR, proteinToTranscript doesn't work (tried it on a couple of different of proteins which have leaderless mRNAs).

> proteinToTranscript(IRanges(start=24, end=68, names="ENSP00000165524"), db)
Fetching CDS for 1 proteins ... 1 found
Checking CDS and protein sequence lengths ... 1/1 OK
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'width' for signature '"NULL"'
In addition: Warning message:
In getUTRsByTranscript(x = x, what = "five", columns = columns,  :
  No fiveUTR found!
jorainer commented 3 years ago

Thanks for the report @elston-n-dsouza . Could you please provide the output of calling sessionInfo() after your code above and also tell me what EnsDb you are using as db above (i.e. which Ensembl version it is). This would allow me to reproduce and then fix the error.

elstondsouza commented 3 years ago

Hi!

Thanks for the super fast reply!

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_sandybridgep-r0.3.16.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] gridExtra_2.3           stringr_1.4.0           stringi_1.7.3          
 [4] biomaRt_2.48.2          ensembldb_2.16.3        AnnotationFilter_1.16.0
 [7] GenomicFeatures_1.44.0  AnnotationDbi_1.54.1    Biobase_2.52.0         
[10] AnnotationHub_3.0.1     BiocFileCache_2.0.0     dbplyr_2.1.1           
[13] rtracklayer_1.52.0      GenomicRanges_1.44.0    GenomeInfoDb_1.28.1    
[16] IRanges_2.26.0          S4Vectors_0.30.0        BiocGenerics_0.38.0    
[19] magrittr_2.0.1          data.table_1.14.0      

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.24.0           bitops_1.0-7                 
 [3] matrixStats_0.60.0            bit64_4.0.5                  
 [5] filelock_1.0.2                progress_1.2.2               
 [7] httr_1.4.2                    tools_4.1.0                  
 [9] utf8_1.2.2                    R6_2.5.0                     
[11] DBI_1.1.1                     lazyeval_0.2.2               
[13] withr_2.4.2                   tidyselect_1.1.1             
[15] prettyunits_1.1.1             bit_4.0.4                    
[17] curl_4.3.2                    compiler_4.1.0               
[19] xml2_1.3.2                    DelayedArray_0.18.0          
[21] rappdirs_0.3.3                digest_0.6.27                
[23] Rsamtools_2.8.0               R.utils_2.10.1               
[25] XVector_0.32.0                pkgconfig_2.0.3              
[27] htmltools_0.5.1.1             MatrixGenerics_1.4.0         
[29] fastmap_1.1.0                 rlang_0.4.11                 
[31] rstudioapi_0.13               RSQLite_2.2.7                
[33] shiny_1.6.0                   BiocIO_1.2.0                 
[35] generics_0.1.0                jsonlite_1.7.2               
[37] BiocParallel_1.26.1           dplyr_1.0.7                  
[39] R.oo_1.24.0                   RCurl_1.98-1.3               
[41] GenomeInfoDbData_1.2.6        Matrix_1.3-3                 
[43] Rcpp_1.0.7                    fansi_0.5.0                  
[45] lifecycle_1.0.0               R.methodsS3_1.8.1            
[47] yaml_2.2.1                    SummarizedExperiment_1.22.0  
[49] zlibbioc_1.38.0               blob_1.2.2                   
[51] promises_1.2.0.1              crayon_1.4.1                 
[53] lattice_0.20-44               Biostrings_2.60.1            
[55] hms_1.1.0                     KEGGREST_1.32.0              
[57] pillar_1.6.1                  rjson_0.2.20                 
[59] XML_3.99-0.6                  glue_1.4.2                   
[61] BiocVersion_3.13.1            BiocManager_1.30.16          
[63] png_0.1-7                     vctrs_0.3.8                  
[65] httpuv_1.6.1                  gtable_0.3.0                 
[67] purrr_0.3.4                   assertthat_0.2.1             
[69] cachem_1.0.5                  mime_0.11                    
[71] xtable_1.8-4                  restfulr_0.0.13              
[73] later_1.2.0                   tibble_3.1.3                 
[75] GenomicAlignments_1.28.0      memoise_2.0.0                
[77] ellipsis_0.3.2                interactiveDisplayBase_1.30.0

The db used is from the AnnotationHub, currently set to the latest Ensembl Version 104.

>hub = AnnotationHub()
>Ens_query <- query(hub, c("EnsDb", "sapiens", 104))
> names(Ens_query)
[1] "AH95744"
>db <- Ens_query[[names(Ens_query)]]
> db
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.6
|Creation time: Tue Jul 20 19:44:24 2021
|ensembl_version: 104
|ensembl_host: localhost
|Organism: Homo sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 2.1
| No. of genes: 67990.
| No. of transcripts: 259749.
|Protein data available.
jorainer commented 3 years ago

This should be fixed now in the updated version(s) of ensembldb. You can either wait a couple of days until the package update will be available on Bioconductor (i.e. with BiocManager::install("ensembldb")) or alternatively you can install it with BiocManager::install("jorainer/ensembldb", ref = "RELEASE_3_13") (for Bioconductor release 3.13).

Please try that @elston-n-dsouza and close this issue if it works for you.

elstondsouza commented 3 years ago

Thanks @jorainer!

It works now!