rnabioco / valr

Genome Interval Arithmetic in R
http://rnabioco.github.io/valr/
Other
88 stars 25 forks source link

multiIntersectBed like function #407

Closed loukesio closed 1 year ago

loukesio commented 1 year ago

Valr is pretty neat and cool using it over R. I was wondering if there is a function or plan to write a function like the

multiIntersectBed to find the intersect of multiple bed files

jayhesselberth commented 1 year ago

This reprex illustrates how to intersect multiple interval sets; the output could be manipulated to yield a format identical to bedtools multiinter.

library(valr)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    500,
  "chr2", 200,    400,
  "chr2", 300,    500,
  "chr2", 800,    900
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 350,    430,  300
)

z <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 750,    900,  400
)

bed_intersect(x, y, z)
#> # A tibble: 11 × 8
#>    chrom start.x end.x start.y end.y value.y .source .overlap
#>    <chr>   <dbl> <dbl>   <dbl> <dbl>   <dbl> <chr>      <int>
#>  1 chr1      100   500     150   400     100 y            250
#>  2 chr1      100   500     150   400     100 z            250
#>  3 chr1      100   500     500   550     100 y              0
#>  4 chr1      100   500     500   550     100 z              0
#>  5 chr2      200   400     230   430     200 y            170
#>  6 chr2      200   400     230   430     200 z            170
#>  7 chr2      200   400     350   430     300 y             50
#>  8 chr2      300   500     230   430     200 y            130
#>  9 chr2      300   500     230   430     200 z            130
#> 10 chr2      300   500     350   430     300 y             80
#> 11 chr2      800   900     750   900     400 z            100

bed_intersect(x, exons = y, introns = z)
#> # A tibble: 11 × 8
#>    chrom start.x end.x start.y end.y value.y .source .overlap
#>    <chr>   <dbl> <dbl>   <dbl> <dbl>   <dbl> <chr>      <int>
#>  1 chr1      100   500     150   400     100 exons        250
#>  2 chr1      100   500     150   400     100 introns      250
#>  3 chr1      100   500     500   550     100 exons          0
#>  4 chr1      100   500     500   550     100 introns        0
#>  5 chr2      200   400     230   430     200 exons        170
#>  6 chr2      200   400     230   430     200 introns      170
#>  7 chr2      200   400     350   430     300 exons         50
#>  8 chr2      300   500     230   430     200 exons        130
#>  9 chr2      300   500     230   430     200 introns      130
#> 10 chr2      300   500     350   430     300 exons         80
#> 11 chr2      800   900     750   900     400 introns      100

Created on 2023-05-22 with reprex v2.0.2

jayhesselberth commented 1 year ago

I think this gets close to the output of bedtools multiinter:

library(valr)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    500,
  "chr2", 200,    400,
  "chr2", 300,    500,
  "chr2", 800,    900
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 350,    430,  300
)

z <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 750,    900,  400
)

library(tidyverse)
out <- bed_intersect(x, exons = y, introns = z)

summarize(
  out, 
  .sources = list(.source), .n = n(),
  .by = c(chrom, start.x, end.x)
)
#> # A tibble: 4 × 5
#>   chrom start.x end.x .sources     .n
#>   <chr>   <dbl> <dbl> <list>    <int>
#> 1 chr1      100   500 <chr [4]>     4
#> 2 chr2      200   400 <chr [3]>     3
#> 3 chr2      300   500 <chr [3]>     3
#> 4 chr2      800   900 <chr [1]>     1

Created on 2023-05-22 with reprex v2.0.2

loukesio commented 1 year ago

I appreciate that you replied to me so quickly! I love valr and appreciate your time developing it.

Can valr also consider the strandedness and find intersect considering the strand?

jayhesselberth commented 1 year ago

Yes, see https://rnabioco.github.io/valr/dev/reference/bed_intersect.html#details

You just need to group the tbls by strand prior to the intersect.