To use the GenomicRanges::tileGenome(seqlengths, ...) function, we need to know what to assign to "seqlengths: Either a named numeric vector of chromosome lengths or a Seqinfo object." This becomes problematic downstream when using annotatr because if we don't use an off-the-shelf Seqinfo object, e.g. hg19, we may get Seqinfo difference errors.
One solution is to use the maximum position of the loci of bs + win_size to build the tiles, and then wipe any Seqinfo object that may get inherited by the returned BSseq-class object.
Function call
tile_by_windows(bs, win_size = 200)
Description
An optional function to aggregate cytosine / CpG level data into regions based on a tiling of the genome with windows having win_size.
Arguments
bs a BSseq object.
win_size the integer size of the windows by which to tile the genome, and group CpGs into.
Values
A BSseq-class object with loci of tiles win_sizebp in width over the genome. Coverage and methylation read count matrices are aggregated by the sums of the cytosines / CpGs in the tiles per sample.
Tests
Test for the correct sums in tiles.
Test for inclusion/exclusion of cytosines / CpGs in regions (left or right closed or open), and include that in the documentation once you're sure of the behavior.
Is it possible for a non-destranded CpG pair to be split among neighboring regions?
Notes
This function will use tile_by_regions() #32, once the GRanges of windows are constructed.
Issues to resolve before implementation
GenomicRanges::tileGenome(seqlengths, ...)
function, we need to know what to assign to "seqlengths
: Either a named numeric vector of chromosome lengths or a Seqinfo object." This becomes problematic downstream when usingannotatr
because if we don't use an off-the-shelfSeqinfo
object, e.g. hg19, we may getSeqinfo
difference errors.bs
+win_size
to build the tiles, and then wipe anySeqinfo
object that may get inherited by the returnedBSseq
-class object.Function call
tile_by_windows(bs, win_size = 200)
Description
An optional function to aggregate cytosine / CpG level data into regions based on a tiling of the genome with windows having
win_size
.Arguments
bs
aBSseq
object.win_size
theinteger
size of the windows by which to tile the genome, and group CpGs into.Values
A
BSseq
-class object with loci of tileswin_size
bp in width over the genome. Coverage and methylation read count matrices are aggregated by the sums of the cytosines / CpGs in the tiles per sample.Tests
Notes
tile_by_regions()
#32, once theGRanges
of windows are constructed.