jorainer / ensembldb

This is the ensembldb development repository.
https://jorainer.github.io/ensembldb
33 stars 10 forks source link

getGeneRegionTrackForGviz -- syntax error #132

Closed plger closed 2 years ago

plger commented 2 years ago

I'm experiencing a weird error in very specific settings:

library(AnnotationHub)
library(ensembldb)
ah <- AnnotationHub()
ensdb <- ah[["AH95713"]]   # Drosophila EnsDb
test <- getGeneRegionTrackForGviz(
   ensdb, chromosome="2R", start=15000, end=200000)

I get:

Error: near ")": syntax error

The traceback shows:

21: stop(structure(list(message = "near \")\": syntax error", call = NULL, 
        cppstack = NULL), class = c("Rcpp::exception", "C++Error", 
    "error", "condition")))
20: result_create(conn@ptr, statement)
19: initialize(value, ...)
18: initialize(value, ...)
17: new("SQLiteResult", sql = statement, ptr = result_create(conn@ptr, 
        statement), conn = conn, bigint = conn@bigint)
16: .local(conn, statement, ...)
15: dbSendQuery(conn, statement, ...)
14: dbSendQuery(conn, statement, ...)
13: .local(conn, statement, ...)
12: dbGetQuery(dbconn(x), Q)
11: dbGetQuery(dbconn(x), Q)
10: .getWhat(x = x, columns = columns, filter = filter, order.by = order.by, 
        order.type = order.type, group.by = group.by, skip.order.check = skip.order.check, 
        startWith = startWith, join = join)
9: .local(x, ...)
8: getWhat(x, columns = columns, filter = filter, order.by = order.by, 
       order.type = order.type, startWith = "tx", join = "suggested")
7: getWhat(x, columns = columns, filter = filter, order.by = order.by, 
       order.type = order.type, startWith = "tx", join = "suggested")
6: .local(x, ...)
5: transcripts(x, filter = filter, columns = needCols, return.type = "data.frame")
4: transcripts(x, filter = filter, columns = needCols, return.type = "data.frame")
3: .local(x, ...)
2: getGeneRegionTrackForGviz(ensdb, chromosome = "2R", start = 15000, 
       end = 2e+05)
1: getGeneRegionTrackForGviz(ensdb, chromosome = "2R", start = 15000, 
       end = 2e+05)

However if I extend a bit the region, I don't have the error:

tmp <- getGeneRegionTrackForGviz(ensdb, chromosome="2R",
                                   start=15000, end=300000)

but only the warning that no CDS is found...

I'm admittedly not on the latest Bioc, but I've updated ensembldb to 2.19.8 and haven't seen a difference...

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 21.04

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] AnnotationHub_3.0.1     BiocFileCache_2.0.0     dbplyr_2.1.1           
 [4] ensembldb_2.19.8        AnnotationFilter_1.16.0 GenomicFeatures_1.44.2 
 [7] AnnotationDbi_1.54.1    Biobase_2.52.0          GenomicRanges_1.44.0   
[10] GenomeInfoDb_1.28.2     IRanges_2.26.0          S4Vectors_0.30.0       
[13] BiocGenerics_0.38.0    

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.4.3          httr_1.4.2                   
 [3] bit64_4.0.5                   shiny_1.6.0                  
 [5] assertthat_0.2.1              interactiveDisplayBase_1.30.0
 [7] BiocManager_1.30.16           blob_1.2.2                   
 [9] GenomeInfoDbData_1.2.6        Rsamtools_2.8.0              
[11] yaml_2.2.1                    progress_1.2.2               
[13] BiocVersion_3.13.1            pillar_1.6.2                 
[15] RSQLite_2.2.10                lattice_0.20-45              
[17] glue_1.4.2                    digest_0.6.27                
[19] promises_1.2.0.1              XVector_0.32.0               
[21] httpuv_1.6.3                  htmltools_0.5.2              
[23] Matrix_1.4-0                  XML_3.99-0.8                 
[25] pkgconfig_2.0.3               biomaRt_2.48.3               
[27] zlibbioc_1.38.0               xtable_1.8-4                 
[29] purrr_0.3.4                   later_1.3.0                  
[31] BiocParallel_1.26.2           tibble_3.1.4                 
[33] KEGGREST_1.32.0               generics_0.1.0               
[35] ellipsis_0.3.2                cachem_1.0.6                 
[37] SummarizedExperiment_1.22.0   lazyeval_0.2.2               
[39] mime_0.11                     magrittr_2.0.1               
[41] crayon_1.4.1                  memoise_2.0.0                
[43] fansi_0.5.0                   xml2_1.3.2                   
[45] tools_4.1.2                   prettyunits_1.1.1            
[47] hms_1.1.0                     BiocIO_1.2.0                 
[49] lifecycle_1.0.0               matrixStats_0.60.1           
[51] stringr_1.4.0                 DelayedArray_0.18.0          
[53] Biostrings_2.60.2             compiler_4.1.2               
[55] rlang_0.4.11                  grid_4.1.2                   
[57] RCurl_1.98-1.4                rstudioapi_0.13              
[59] rjson_0.2.20                  rappdirs_0.3.3               
[61] bitops_1.0-7                  restfulr_0.0.13              
[63] DBI_1.1.1                     curl_4.3.2                   
[65] R6_2.5.1                      GenomicAlignments_1.28.0     
[67] dplyr_1.0.7                   rtracklayer_1.52.1           
[69] fastmap_1.1.0                 bit_4.0.4                    
[71] utf8_1.2.2                    filelock_1.0.2               
[73] ProtGenerics_1.24.0           stringi_1.7.4                
[75] Rcpp_1.0.7                    vctrs_0.3.8                  
[77] png_0.1-7                     tidyselect_1.1.1
jorainer commented 2 years ago

Thanks for reporting! I'll have a look into it later today (sorry for the delay)!

jorainer commented 2 years ago

There seem to be no transcripts within the provided range. This was not handled properly within the function causing thus an error. I will fix that by returning in such cases a GRanges with the input coords, but no additional features.

jorainer commented 2 years ago

With the fix in the devel and BioC3.14 release version you'll get the following (instead of an error):

> library(AnnotationHub)
> library(ensembldb)
> ah <- AnnotationHub()
snapshotDate(): 2022-02-22
> ensdb <- ah[["AH95713"]]   # Drosophila EnsDb
loading from cache
> test <- getGeneRegionTrackForGviz(
+    ensdb, chromosome="2R", start=15000, end=200000)
> test
GRanges object with 1 range and 0 metadata columns:
      seqnames       ranges strand
         <Rle>    <IRanges>  <Rle>
  [1]       2R 15000-200000      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Hope that's acceptable @plger

plger commented 2 years ago

Hi! Thanks for the fix. I'm not sure I understand the choice to return this instead of an empty GRanges, is the idea to ensure that the right region is nevertheless plotted in GViz even if it's the only specification of the region? But either way is fine by me, so long as it doesn't break ;)

jorainer commented 2 years ago

yes, exactly - I thought it might be better to return the input range instead of an empty one - but if you think it might be better to return an empty one I could return that instead.

plger commented 2 years ago

I might not be representative of all use cases, but I would personally return an empty one. I normally just include the gene track systematically when plotting whatever in Gviz, whether there's something or not, for instance to know what's around a peak. If it returns the region it will plot the bar and give the misleading impression that there's a gene in the region... Instead, the only argument against an empty GRanges is that if there's no other element specifying the region, it will give the error "Unable to automatically determine plotting ranges from the supplied track(s)." But that seems to me a decent behavior...

jorainer commented 2 years ago

OK, then I'll revert to return an empty GRanges instead.

plger commented 2 years ago

thanks again for developing (and maintaining!) this awesome package, and was great to see you in Heidelberg ;)

jorainer commented 2 years ago

Yes, was fun meeting you in person! And thanks for reporting bugs - makes it easier to maintain :)