lawremi / rtracklayer

R interface to genome annotation files and the UCSC genome browser
Other
29 stars 17 forks source link

export.bw only write the last chromosome #132

Open lldelisle opened 1 month ago

lldelisle commented 1 month ago

Dear rtracklayer developers, I found an issue when I use export.bw. Here is a way to reproduce the bug into my installation:

library(rtracklayer)
test.input <- RleList(
    "chr1" = c(rep(0, 100), rep(1, 10), rep(3, 10)),
    "chr2" = c(rep(0, 200))
)
export.bw(test.input, "test.bw")
temp <- import.bw("test.bw")
print("BIGWIG from Rle")
print(temp)
test.input.gr <- GRanges(
    seqnames = c("chr1", "chr1", "chr1", "chr2"),
    ranges = IRanges(
        start = c(1, 101, 111, 1),
        end = c(100, 110, 120, 200)
    )
)
test.input.gr$score <- c(0, 1, 3, 0)
seqlengths(test.input.gr) <- c("chr1" = 120, "chr2" = 200)
export.bw(test.input.gr, "test.gr.bw")
temp.gr <- import.bw("test.gr.bw")
print("BIGWIG from GRanges")
print(temp.gr)
print(sessionInfo())

In both cases, the result of import.bw has a single chromosome:

[1] "BIGWIG from Rle"
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr2     1-200      * |         0
  -------
  seqinfo: 1 sequence from an unspecified genome
[1] "BIGWIG from GRanges"
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr2     1-200      * |         0
  -------
  seqinfo: 1 sequence from an unspecified genome

Here is my session info:

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.12.0 
LAPACK: /usr/lib/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rtracklayer_1.64.0   GenomicRanges_1.56.1 GenomeInfoDb_1.40.1 
[4] IRanges_2.38.1       S4Vectors_0.42.1     BiocGenerics_0.50.0 

loaded via a namespace (and not attached):
 [1] Matrix_1.7-0                jsonlite_1.8.9             
 [3] compiler_4.4.1              rjson_0.2.23               
 [5] crayon_1.5.3                SummarizedExperiment_1.34.0
 [7] Biobase_2.64.0              Rsamtools_2.20.0           
 [9] bitops_1.0-8                Biostrings_2.72.1          
[11] GenomicAlignments_1.40.0    parallel_4.4.1             
[13] BiocParallel_1.38.0         yaml_2.3.10                
[15] lattice_0.22-6              R6_2.5.1                   
[17] XVector_0.44.0              S4Arrays_1.4.1             
[19] curl_5.2.3                  XML_3.99-0.17              
[21] DelayedArray_0.30.1         MatrixGenerics_1.16.0      
[23] GenomeInfoDbData_1.2.12     SparseArray_1.4.8          
[25] zlibbioc_1.50.0             grid_4.4.1                 
[27] codetools_0.2-20            abind_1.4-8                
[29] RCurl_1.98-1.16             restfulr_0.0.15            
[31] httr_1.4.7                  matrixStats_1.4.1          
[33] tools_4.4.1                 BiocIO_1.14.0              
[35] UCSC.utils_1.0.0           
sanchit-saini commented 1 month ago

Hi @lldelisle,

Thanks for reporting. I am going to investigate this and get back to you.

lldelisle commented 1 month ago

It is going to be difficult to debug... I have tested the same versions in a docker with ubuntu and I could not reproduce the bug:

[1] "BIGWIG from Rle"
GRanges object with 4 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1     1-100      * |         0
  [2]     chr1   101-110      * |         1
  [3]     chr1   111-120      * |         3
  [4]     chr2     1-200      * |         0
  -------
  seqinfo: 2 sequences from an unspecified genome
[1] "BIGWIG from GRanges"
GRanges object with 4 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1     1-100      * |         0
  [2]     chr1   101-110      * |         1
  [3]     chr1   111-120      * |         3
  [4]     chr2     1-200      * |         0
  -------
  seqinfo: 2 sequences from an unspecified genome
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux trixie/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.27.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rtracklayer_1.64.0   GenomicRanges_1.56.1 GenomeInfoDb_1.40.1 
[4] IRanges_2.38.1       S4Vectors_0.42.1     BiocGenerics_0.50.0 

loaded via a namespace (and not attached):
 [1] Matrix_1.7-0                jsonlite_1.8.9             
 [3] compiler_4.4.1              rjson_0.2.23               
 [5] crayon_1.5.3                SummarizedExperiment_1.34.0
 [7] Biobase_2.64.0              Rsamtools_2.20.0           
 [9] bitops_1.0-8                Biostrings_2.72.1          
[11] GenomicAlignments_1.40.0    parallel_4.4.1             
[13] BiocParallel_1.38.0         yaml_2.3.10                
[15] lattice_0.22-6              R6_2.5.1                   
[17] XVector_0.44.0              S4Arrays_1.4.1             
[19] curl_5.2.3                  XML_3.99-0.17              
[21] DelayedArray_0.30.1         MatrixGenerics_1.16.0      
[23] GenomeInfoDbData_1.2.12     SparseArray_1.4.8          
[25] zlibbioc_1.50.0             grid_4.4.1                 
[27] codetools_0.2-20            abind_1.4-8                
[29] RCurl_1.98-1.16             restfulr_0.0.15            
[31] httr_1.4.7                  matrixStats_1.4.1          
[33] tools_4.4.1                 BiocIO_1.14.0              
[35] UCSC.utils_1.0.0 

I am testing on a docker archlinux now. I will keep you updated.

lldelisle commented 1 month ago

I can reproduce with archlinux docker. Here is how to test it, if you want to debug:

docker run -it archlinux bash
pacman -Syu r-base
pacman -Syu base-devel
R
install.packages("BiocManager")
BiocManager::install("rtracklayer")
sanchit-saini commented 1 month ago

Thanks for giving out the steps to reproduce it. I just tested it and it is reproducible. This is really weird!!. I will investigate why it happened.