Closed canholyavkin closed 4 years ago
Hi. Thanks for your interest in valr
. The default sorting is by chrom
, start
, and end
. The by_chrom
argument only affects the output when by_size = TRUE
or reverse = TRUE
, as by default the output is sorted by chromosome first.
The sorting of the chrom
column is alphanumeric. This is the default convention used by bedtools
, but is in contrast to the sort order of bam
files which sort by chromosome size (as defined in the bam header).
library(valr)
library(tibble)
library(dplyr)
rd <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 1, 100,
"chr11", 1, 100,
"chr4", 1, 100,
"chr2", 1, 100,
)
# base R sorting
sort(rd$chrom)
#> [1] "chr1" "chr11" "chr2" "chr4"
"chr11" < "chr2"
#> [1] TRUE
To define a custom sort (non-alphanumeric) order you could use a "genome"-like data.frame that defines the chromosome ordering and pipe output from valr
to dplyr::left_join()
# retain sorting by chrom order defined in other data.frame
chroms <- tribble(
~chrom,
"chr1",
"chr2",
"chr4",
"chr11"
)
bed_intersect(rd, rd, suffix = c("", "_b")) %>%
left_join(chroms, ., by = c("chrom"))
#> # A tibble: 4 x 6
#> chrom start end start_b end_b .overlap
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 chr1 1 100 1 100 99
#> 2 chr2 1 100 1 100 99
#> 3 chr4 1 100 1 100 99
#> 4 chr11 1 100 1 100 99
Created on 2020-02-19 by the reprex package (v0.3.0)
Here is the default sorting used in bedtools sort
echo -e "chr1\t1\t100\nchr11\t1\t100\nchr2\t1\t100" | bedtools sort
chr1 1 100
chr11 1 100
chr2 1 100
Thank you for clear and detailed explanation.
The
bed_sort
function doesn't seem to sort the bed file alphanumerical by chromosome column. In the example below, the chromosome 11 record located on the second row after sorting, while it should be at the end.by_chrom = T
parameter also doesn't change the order.