lawremi / rtracklayer

R interface to genome annotation files and the UCSC genome browser
Other
28 stars 17 forks source link

`import.bw`: Unable to read remote files #73

Open bschilder opened 2 years ago

bschilder commented 2 years ago

I'm afraid this same error (#63 ) is still happening with remote bigwig files, even with the dev version of rtracklayer (1.57.0, installed from GitHub) @sanchit-saini :

Reprex

viewpoint <- 166169213 
gr.span <- GenomicRanges::GRanges(
    seqnames = "chr6",
    ranges = IRanges::IRanges(
        start = viewpoint - 1000000,
        end = viewpoint + 1000000
    )
)

gr <- rtracklayer::import("https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig") 

Error

Error in seqinfo(ranges) : UCSC library operation failed
In addition: Warning messages:
1: In seqinfo(ranges) :
  Response is missing required header Content-Length: for url https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig
2: In seqinfo(ranges) :

I confirmed that downloading the file manually and importing from the local file works, meaning the file itself is fine. It just importing directly from the remote server that import has trouble with.

Session info

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3            rjson_0.2.21                deldir_1.0-6               
  [4] ellipsis_0.3.2              rprojroot_2.0.3             biovizBase_1.44.0          
  [7] htmlTable_2.4.1             XVector_0.36.0              GenomicRanges_1.48.0       
 [10] base64enc_0.1-3             dichromat_2.0-0.1           rstudioapi_0.13            
 [13] remotes_2.4.2               bit64_4.0.5                 AnnotationDbi_1.58.0       
 [16] fansi_1.0.3                 xml2_1.3.3                  codetools_0.2-18           
 [19] splines_4.2.0               ggbio_1.44.1                cachem_1.0.6               
 [22] knitr_1.39                  Formula_1.2-4               Rsamtools_2.12.0           
 [25] cluster_2.1.3               dbplyr_2.2.1                png_0.1-7                  
 [28] graph_1.74.0                BiocManager_1.30.18         compiler_4.2.0             
 [31] httr_1.4.3                  backports_1.4.1             assertthat_0.2.1           
 [34] Matrix_1.4-1                fastmap_1.1.0               lazyeval_0.2.2             
 [37] cli_3.3.0                   htmltools_0.5.3             prettyunits_1.1.1          
 [40] tools_4.2.0                 gtable_0.3.0                glue_1.6.2                 
 [43] GenomeInfoDbData_1.2.8      reshape2_1.4.4              dplyr_1.0.9                
 [46] rappdirs_0.3.3              Rcpp_1.0.9                  Biobase_2.56.0             
 [49] vctrs_0.4.1                 Biostrings_2.64.0           rtracklayer_1.57.0         
 [52] xfun_0.31                   stringr_1.4.0               lifecycle_1.0.1            
 [55] restfulr_0.0.15             ensembldb_2.20.2            XML_3.99-0.10              
 [58] zlibbioc_1.42.0             scales_1.2.0                BSgenome_1.64.0            
 [61] VariantAnnotation_1.42.1    hms_1.1.1                   MatrixGenerics_1.8.1       
 [64] ProtGenerics_1.28.0         RBGL_1.72.0                 parallel_4.2.0             
 [67] SummarizedExperiment_1.26.1 AnnotationFilter_1.20.0     RColorBrewer_1.1-3         
 [70] yaml_2.3.5                  curl_4.3.2                  memoise_2.0.1              
 [73] gridExtra_2.3               ggplot2_3.3.6               biomaRt_2.52.0             
 [76] rpart_4.1.16                reshape_0.8.9               latticeExtra_0.6-30        
 [79] stringi_1.7.8               RSQLite_2.2.15              S4Vectors_0.34.0           
 [82] BiocIO_1.6.0                checkmate_2.1.0             GenomicFeatures_1.48.3     
 [85] BiocGenerics_0.42.0         filelock_1.0.2              BiocParallel_1.30.3        
 [88] GenomeInfoDb_1.32.2         rlang_1.0.4                 pkgconfig_2.0.3            
 [91] matrixStats_0.62.0          bitops_1.0-7                lattice_0.20-45            
 [94] purrr_0.3.4                 GenomicAlignments_1.32.1    htmlwidgets_1.5.4          
 [97] bit_4.0.4                   tidyselect_1.1.2            here_1.0.1                 
[100] GGally_2.1.2                plyr_1.8.7                  magrittr_2.0.3             
[103] R6_2.5.1                    IRanges_2.30.0              generics_0.1.3             
[106] Hmisc_4.7-0                 DelayedArray_0.22.0         DBI_1.1.3                  
[109] pillar_1.8.0                foreign_0.8-82              survival_3.3-1             
[112] KEGGREST_1.36.3             RCurl_1.98-1.8              nnet_7.3-17                
[115] tibble_3.1.8                crayon_1.5.1                interp_1.1-3               
[118] utf8_1.2.2                  OrganismDbi_1.38.1          BiocFileCache_2.4.0        
[121] jpeg_0.1-9                  progress_1.2.2              grid_4.2.0                 
[124] data.table_1.14.2           blob_1.2.3                  digest_0.6.29              
[127] stats4_4.2.0                munsell_0.5.0  
sanchit-saini commented 2 years ago

Hi @bschilder, It is not related to #63. I know looks a bit confusing at first but if you look at the error message closely you will notice it is different from the previously reported issue.

rtracklayer follows a convention of displaying error message UCSC library operation failed whenever any UCSC kent library operation failed.

canonical error message is Response is missing required header Content-Length: for url https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig

Having said that I looked at the codebase and this issue is present in the upstream UCSC kent library source. So I think this must be reported to the UCSC folks as they would be able to solve it quickly.

After it gets fixed in the upstream we will pull those changes and merge them.

Thanks Sanchit

bschilder commented 2 years ago

Thanks for the quick reply, @sanchit-saini

Just alerted them here: https://github.com/ucscGenomeBrowser/kent/issues/69