Closed loukesio closed 1 year ago
This reprex illustrates how to intersect multiple interval sets; the output could be manipulated to yield a format identical to bedtools multiinter
.
library(valr)
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 500,
"chr2", 200, 400,
"chr2", 300, 500,
"chr2", 800, 900
)
y <- tibble::tribble(
~chrom, ~start, ~end, ~value,
"chr1", 150, 400, 100,
"chr1", 500, 550, 100,
"chr2", 230, 430, 200,
"chr2", 350, 430, 300
)
z <- tibble::tribble(
~chrom, ~start, ~end, ~value,
"chr1", 150, 400, 100,
"chr1", 500, 550, 100,
"chr2", 230, 430, 200,
"chr2", 750, 900, 400
)
bed_intersect(x, y, z)
#> # A tibble: 11 × 8
#> chrom start.x end.x start.y end.y value.y .source .overlap
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int>
#> 1 chr1 100 500 150 400 100 y 250
#> 2 chr1 100 500 150 400 100 z 250
#> 3 chr1 100 500 500 550 100 y 0
#> 4 chr1 100 500 500 550 100 z 0
#> 5 chr2 200 400 230 430 200 y 170
#> 6 chr2 200 400 230 430 200 z 170
#> 7 chr2 200 400 350 430 300 y 50
#> 8 chr2 300 500 230 430 200 y 130
#> 9 chr2 300 500 230 430 200 z 130
#> 10 chr2 300 500 350 430 300 y 80
#> 11 chr2 800 900 750 900 400 z 100
bed_intersect(x, exons = y, introns = z)
#> # A tibble: 11 × 8
#> chrom start.x end.x start.y end.y value.y .source .overlap
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int>
#> 1 chr1 100 500 150 400 100 exons 250
#> 2 chr1 100 500 150 400 100 introns 250
#> 3 chr1 100 500 500 550 100 exons 0
#> 4 chr1 100 500 500 550 100 introns 0
#> 5 chr2 200 400 230 430 200 exons 170
#> 6 chr2 200 400 230 430 200 introns 170
#> 7 chr2 200 400 350 430 300 exons 50
#> 8 chr2 300 500 230 430 200 exons 130
#> 9 chr2 300 500 230 430 200 introns 130
#> 10 chr2 300 500 350 430 300 exons 80
#> 11 chr2 800 900 750 900 400 introns 100
Created on 2023-05-22 with reprex v2.0.2
I think this gets close to the output of bedtools multiinter
:
library(valr)
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 500,
"chr2", 200, 400,
"chr2", 300, 500,
"chr2", 800, 900
)
y <- tibble::tribble(
~chrom, ~start, ~end, ~value,
"chr1", 150, 400, 100,
"chr1", 500, 550, 100,
"chr2", 230, 430, 200,
"chr2", 350, 430, 300
)
z <- tibble::tribble(
~chrom, ~start, ~end, ~value,
"chr1", 150, 400, 100,
"chr1", 500, 550, 100,
"chr2", 230, 430, 200,
"chr2", 750, 900, 400
)
library(tidyverse)
out <- bed_intersect(x, exons = y, introns = z)
summarize(
out,
.sources = list(.source), .n = n(),
.by = c(chrom, start.x, end.x)
)
#> # A tibble: 4 × 5
#> chrom start.x end.x .sources .n
#> <chr> <dbl> <dbl> <list> <int>
#> 1 chr1 100 500 <chr [4]> 4
#> 2 chr2 200 400 <chr [3]> 3
#> 3 chr2 300 500 <chr [3]> 3
#> 4 chr2 800 900 <chr [1]> 1
Created on 2023-05-22 with reprex v2.0.2
I appreciate that you replied to me so quickly!
I love valr
and appreciate your time developing it.
Can valr
also consider the strandedness and find intersect considering the strand?
Yes, see https://rnabioco.github.io/valr/dev/reference/bed_intersect.html#details
You just need to group the tbls by strand prior to the intersect.
Valr is pretty neat and cool using it over R. I was wondering if there is a function or plan to write a function like the
multiIntersectBed
to find the intersect of multiple bed files