js2264 / HiCExperiment

Importing and manipulating Hi-C data in R
http://js2264.github.io/HiCExperiment/
Other
8 stars 1 forks source link

Pairs importing error for scalogram() #10

Open hackkr opened 2 months ago

hackkr commented 2 months ago

Hello!

I am trying to generate a scalogram from a hic experiment object containing a .mool and .pairs.gz file. The pairs file is the output of pairtools parse2 and includes the sam file header and many additional columns.

I am able to generate matrices from this hic object, and distance law calculations from the pairs file without issue. However, when I try to generate a scaleogram, I get the following error:

Error in .new_IRanges_from_start_width(start, width) : 'start' and 'width' must be numeric vectors
11. stop(wmsg("'start' and 'width' must be numeric vectors"))
10. .new_IRanges_from_start_width(start, width)
9. .new_IRanges(start = start, end = end, width = width)
8. IRanges::IRanges(anchors1[[2]], width = 1)
7. GenomicRanges::GRanges(anchors1[[1]], IRanges::IRanges(anchors1[[2]],width = 1))
6. .pairs2gi(con, ...)
5. import(FileForFormat(con, format), ...)
4. import(FileForFormat(con, format), ...)
3. BiocIO::import(pairsFile, format = "pairs")
2. BiocIO::import(pairsFile, format = "pairs")
1. scalogram(hic)

I get the same error when I try HiCExperiment::import on the pairs file alone. I can import the pairs.gz file using plyinteractions as suggested in other issues, but I am not clear how to add that to a hic object to generate a scaleogram plot.

Here's what the pairs.gz looks likes after importing with plyinteractions:

pairs_file <- 'test1_mapped_sorted_UU.pairs.gz' 
pairs_df <- read.delim(pairs_file, sep = "\t", header = FALSE, comment.char = "#")
pairs <- as_ginteractions(pairs_df, 
    seqnames1 = V2, start1 = V3, strand1 = V6, 
    seqnames2 = V4, start2 = V5, strand2 = V7, 
    width1 = 1, width2 = 1, 
    keep.extra.columns = FALSE
)

pairs
GInteractions object with 841489 interactions and 0 metadata columns:
            seqnames1   ranges1 strand1      seqnames2   ranges2 strand2
                <Rle> <IRanges>   <Rle>          <Rle> <IRanges>   <Rle>
       [1] CP002425.1         1       + --- CP002425.1       819       +
       [2] CP002425.1         1       + --- CP002425.1       823       +
       [3] CP002425.1         1       + --- CP002425.1      1034       +
       [4] CP002425.1         1       + --- CP002425.1    128368       +
       [5] CP002425.1         1       + --- CP002425.1    213988       -
       ...        ...       ...     ... ...        ...       ...     ...
  [841485] CP002425.1   2522992       - --- CP002425.1   2522413       -
  [841486] CP002425.1   2522992       - --- CP002425.1   2522506       -
  [841487] CP002425.1   2522992       - --- CP002425.1   2522506       -
  [841488] CP002425.1   2522992       - --- CP002425.1   2522686       -
  [841489] CP002425.1   2522992       - --- CP002425.1   2522686       -
  -------
  regions: 147875 ranges and 0 metadata columns
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

I have tried extracting the files and importing .pairs, tried removing header rows, and tried removing extra metadata columns and keep encountering this error. I have a different hic object containing a .pairs.gz file from the same experiment made with pairtools parse that generates a scaleogram without error.