rnabioco / valr

Genome Interval Arithmetic in R
http://rnabioco.github.io/valr/
Other
88 stars 25 forks source link

example to convert tbl_interval to GRanges #335

Closed jayhesselberth closed 6 years ago

jayhesselberth commented 6 years ago

Conversion from tbl_interval to GRanges should be more included outside of the benchmarks vignette. Perhaps in the tbl_interval examples.

library(valr)
library(GenomicRanges)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, cbind, colMeans,
#>     colnames, colSums, do.call, duplicated, eval, evalq, Filter,
#>     Find, get, grep, grepl, intersect, is.unsorted, lapply,
#>     lengths, Map, mapply, match, mget, order, paste, pmax,
#>     pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
#>     rowMeans, rownames, rowSums, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:valr':
#> 
#>     values
#> The following object is masked from 'package:base':
#> 
#>     expand.grid
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb

genome <- read_genome(valr_example('hg19.chrom.sizes.gz'))
x <- bed_random(genome)

GRanges(
  seqnames = Rle(x$chrom),
  ranges = IRanges(x$start, end = x$end)
)
#> GRanges object with 1000000 ranges and 0 metadata columns:
#>             seqnames               ranges strand
#>                <Rle>            <IRanges>  <Rle>
#>         [1]     chr1       [ 3621,  4621]      *
#>         [2]     chr1       [ 5349,  6349]      *
#>         [3]     chr1       [ 6861,  7861]      *
#>         [4]     chr1       [ 7354,  8354]      *
#>         [5]     chr1       [13725, 14725]      *
#>         ...      ...                  ...    ...
#>    [999996]     chrY [59360243, 59361243]      *
#>    [999997]     chrY [59364780, 59365780]      *
#>    [999998]     chrY [59366910, 59367910]      *
#>    [999999]     chrY [59367648, 59368648]      *
#>   [1000000]     chrY [59371072, 59372072]      *
#>   -------
#>   seqinfo: 25 sequences from an unspecified genome; no seqlengths

Created on 2018-01-27 by the reprex package (v0.1.1.9000).

raysinensis commented 6 years ago

Just realized there is a function in GenomicRanges to convert df to Granges object that searches for certain column names (see .field options). valr's naming conventions fit in nicely. A simple call of makeGRangesFromDataFrame(x) just works.

makeGRangesFromDataFrame(df,
                         keep.extra.columns=FALSE,
                         ignore.strand=FALSE,
                         seqinfo=NULL,
                         seqnames.field=c("seqnames", "seqname",
                                          "chromosome", "chrom",
                                          "chr", "chromosome_name",
                                          "seqid"),
                         start.field="start",
                         end.field=c("end", "stop"),
                         strand.field="strand",
                         starts.in.df.are.0based=FALSE)
jayhesselberth commented 6 years ago

closed by #334