lawremi / rtracklayer

R interface to genome annotation files and the UCSC genome browser
Other
28 stars 17 forks source link

track query (track = "GC Percent", table = "gc5Base") problem #80

Closed DanielEWeeks closed 1 year ago

DanielEWeeks commented 1 year ago

This call below to the Gviz function UcscTrack fails with the error message:

Error fetching data from UCSC

The call below to UcscTrack sets up a query which then tries to use the track command from the rtracklayer package at this line:

tmp <- try(track(query), silent = TRUE)

This fails with the error message:

Error in do.call(rbind.data.frame, results) : 
  second argument must be a list

as well as with this warning:

Warning: Error setting the option for # 3 (status = 43) (enum = 81) (value = 0x28260ece0): A libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!

As other similar queries work for different tracks/tables despite the warning about the CURLOPT_SSL_VERIFYHOST value, I suspect the relevant error message here is the one about the rbind.data.frame.

Minimal Working Example

This is based on example code from The Gvis User Guide, so it appears to have worked properly in the past - See Section 6 of "The Gvis User Guide"

https://bioconductor.org/packages/devel/bioc/vignettes/Gviz/inst/doc/Gviz.html#6_Track_highlighting_and_overlays.

suppressMessages(library(Gviz))
from <- 65921878
to <- 65980988

gcContent <- UcscTrack(genome = "mm9", chromosome = "chrX",
                       track = "GC Percent", table = "gc5Base",
                       from = from, to = to, trackType = "DataTrack",
                       start = "start", end = "end", data = "score",
                       type = "hist", window = -1, windowSize = 1500,
                       fill.histogram = "black", col.histogram = "black",
                       ylim = c(30, 70), name = "GC Percent")
gcContent

sessionInfo()

Output from the MWE above

> suppressMessages(library(Gviz))
> from <- 65921878
> to <- 65980988
> 
> gcContent <- UcscTrack(genome = "mm9", chromosome = "chrX",
+                        track = "GC Percent", table = "gc5Base",
+                        from = from, to = to, trackType = "DataTrack",
+                        start = "start", end = "end", data = "score",
+                        type = "hist", window = -1, windowSize = 1500,
+                        fill.histogram = "black", col.histogram = "black",
+                        ylim = c(30, 70), name = "GC Percent")
Error in UcscTrack(genome = "mm9", chromosome = "chrX", track = "GC Percent",  : 
  Error fetching data from UCSC
In addition: Warning messages:
1: In curlSetOpt(..., .opts = .opts, curl = h, .encoding = .encoding) :
  Error setting the option for # 3 (status = 43) (enum = 81) (value = 0x1166ebc20): A libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!
2: In .local(x, ...) :
  'track' parameter is deprecated now you go by the 'table' instead
                Use ucscTables(genome, track) to retrieve the list of tables for a track
3: In .local(x, ...) :
  'track' parameter is deprecated now you go by the 'table' instead
                Use ucscTables(genome, track) to retrieve the list of tables for a track
4: In curlSetOpt(..., .opts = .opts, curl = h, .encoding = .encoding) :
  Error setting the option for # 3 (status = 43) (enum = 81) (value = 0x28ca04b50): A libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!
5: In UcscTrack(genome = "mm9", chromosome = "chrX", track = "GC Percent",  :
  Error in do.call(rbind.data.frame, results) : 
  second argument must be a list

> gcContent
Error: object 'gcContent' not found
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.6.7

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Gviz_1.42.0          GenomicRanges_1.50.1 GenomeInfoDb_1.34.4 
[4] IRanges_2.32.0       S4Vectors_0.36.1     BiocGenerics_0.44.0 

loaded via a namespace (and not attached):
  [1] ProtGenerics_1.30.0         bitops_1.0-7               
  [3] matrixStats_0.63.0          bit64_4.0.5                
  [5] filelock_1.0.2              RColorBrewer_1.1-3         
  [7] progress_1.2.2              httr_1.4.4                 
  [9] backports_1.4.1             tools_4.2.2                
 [11] utf8_1.2.2                  R6_2.5.1                   
 [13] rpart_4.1.19                lazyeval_0.2.2             
 [15] Hmisc_4.7-2                 DBI_1.1.3                  
 [17] colorspace_2.0-3            nnet_7.3-18                
 [19] tidyselect_1.2.0            gridExtra_2.3              
 [21] prettyunits_1.1.1           bit_4.0.5                  
 [23] curl_4.3.3                  compiler_4.2.2             
 [25] cli_3.4.1                   Biobase_2.58.0             
 [27] htmlTable_2.4.1             xml2_1.3.3                 
 [29] DelayedArray_0.24.0         rtracklayer_1.58.0         
 [31] checkmate_2.1.0             scales_1.2.1               
 [33] rappdirs_0.3.3              stringr_1.5.0              
 [35] digest_0.6.30               Rsamtools_2.14.0           
 [37] foreign_0.8-84              XVector_0.38.0             
 [39] dichromat_2.0-0.1           htmltools_0.5.4            
 [41] base64enc_0.1-3             jpeg_0.1-10                
 [43] pkgconfig_2.0.3             MatrixGenerics_1.10.0      
 [45] ensembldb_2.22.0            dbplyr_2.2.1               
 [47] fastmap_1.1.0               BSgenome_1.66.1            
 [49] htmlwidgets_1.5.4           rlang_1.0.6                
 [51] rstudioapi_0.14             RSQLite_2.2.19             
 [53] BiocIO_1.8.0                generics_0.1.3             
 [55] BiocParallel_1.32.4         dplyr_1.0.10               
 [57] VariantAnnotation_1.44.0    RCurl_1.98-1.9             
 [59] magrittr_2.0.3              GenomeInfoDbData_1.2.9     
 [61] Formula_1.2-4               interp_1.1-3               
 [63] Matrix_1.5-3                Rcpp_1.0.9                 
 [65] munsell_0.5.0               fansi_1.0.3                
 [67] lifecycle_1.0.3             stringi_1.7.8              
 [69] yaml_2.3.6                  SummarizedExperiment_1.28.0
 [71] zlibbioc_1.44.0             BiocFileCache_2.6.0        
 [73] blob_1.2.3                  parallel_4.2.2             
 [75] crayon_1.5.2                deldir_1.0-6               
 [77] lattice_0.20-45             Biostrings_2.66.0          
 [79] splines_4.2.2               GenomicFeatures_1.50.2     
 [81] hms_1.1.2                   KEGGREST_1.38.0            
 [83] knitr_1.41                  pillar_1.8.1               
 [85] rjson_0.2.21                codetools_0.2-18           
 [87] biomaRt_2.54.0              XML_3.99-0.13              
 [89] glue_1.6.2                  biovizBase_1.46.0          
 [91] latticeExtra_0.6-30         data.table_1.14.6          
 [93] png_0.1-8                   vctrs_0.5.1                
 [95] gtable_0.3.1                assertthat_0.2.1           
 [97] cachem_1.0.6                ggplot2_3.4.0              
 [99] xfun_0.35                   AnnotationFilter_1.22.0    
[101] restfulr_0.0.15             survival_3.4-0             
[103] tibble_3.1.8                GenomicAlignments_1.34.0   
[105] AnnotationDbi_1.60.0        memoise_2.0.1              
[107] cluster_2.1.4               ellipsis_0.3.2 
DanielEWeeks commented 1 year ago

For more details, see also

https://support.bioconductor.org/p/9148324/


Looks like the error is happening in parseResponse of rtracklayer, where the first few lines are:

  results <- response[[tableName]]
  if (is.null(names(results))) {
    df <- do.call(rbind.data.frame, results)
  }

If we watch this in the debugger, we see when we enter parseResponse, we have:

Browse[2]> tableName
[1] "gc5Base"
Browse[2]> names(response)
 [1] "downloadTime"      "downloadTimeStamp" "genome"           
 [4] "dataTime"          "dataTimeStamp"     "trackType"        
 [7] "track"             "start"             "end"              
[10] "chrom"             "chrX"              "itemsReturned"

where the Wiggle track list information is in the chrX element of the response, but the code is trying to pull out the non-existent gc5Base element of the response instead.

So the line

 results <- response[[tableName]]

sets results to NULL, generating the subsequent error message:

Error in do.call(rbind.data.frame, results) : 
  second argument must be a list