rnabioco / valr

Genome Interval Arithmetic in R
http://rnabioco.github.io/valr/
Other
88 stars 25 forks source link

bed_shift and bed_slop failed test "going beyond the start of the chrom" #324

Closed raysinensis closed 6 years ago

raysinensis commented 6 years ago

bedtools outputs 0, 1 for start and end in these edge cases

library(valr)
tiny.genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 1000
)
b <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 50, 60, "a1", 5, "-"
)
bed_shift(b, tiny.genome, size = -100, trim = TRUE)
#> # A tibble: 1 x 6
#>   chrom start   end  name score strand
#>   <chr> <dbl> <dbl> <chr> <dbl>  <chr>
#> 1  chr1     0   -40    a1     5      -
bed_slop(b, tiny.genome, left = -100, right = 100, strand = TRUE, trim = TRUE)
#> # A tibble: 1 x 6
#>   chrom start   end  name score strand
#>   <chr> <dbl> <dbl> <chr> <dbl>  <chr>
#> 1  chr1     0   -40    a1     5      -

Created on 2018-01-19 by the reprex package (v0.1.1.9000).

kriemo commented 6 years ago

Some minor modifications to bound_intervals() fixes this issue, #323, and #321

Here's a fix that bounds the starts and ends to 0 and the chromosome size. Also added a check for zero length intervals.

bound_intervals <- function(x, genome, trim = FALSE) {
  if (!is.tbl_interval(x)) x <- as.tbl_interval(x)

  res <- left_join(x, genome, by = "chrom")
  if (trim) {
    res <- mutate(
      res,
      start = ifelse(start < 0, 0, min(start, size - 1)),
      end = ifelse(end > size, size, max(1, end))
    )
    res <- select(res, -size)
  } else {
    res <- filter(res, start >= 0 & start < size & end <= size & end > 0)
    res <- select(res, -size)
  }

  if(any(res$start == res$end)){
    warning(paste0(sum(res$start == res$end),
                   " intervals generated with same start and end were discarded"))
  }

  res <- res[res$start != res$end, ]

  res
}