hansenlab / bsseq

Devel repository for bsseq
35 stars 25 forks source link

Request to remove requirement for gr to have widths 1 in BSseq constructor #68

Open rcavalcante opened 6 years ago

rcavalcante commented 6 years ago

Hi,

I'm a big fan of bsseq because it is exactly the data structure I need for a package I maintain (methylSig) that tests for differential methylation (at CpGs and over regions).

As of the Bioc 3.7 release it appears that a check in the BSseq constructor was added (I don't see a reference to it in the NEWS file).

    if(any(width(gr) != 1))
        stop("'gr' needs to have widths of 1")

It was incredibly convenient to be able to create BSseq objects that aggregated CpGs over regions so that there was still a unified methylation object to access and manipulate.

I would like to request that this check be removed to enable BSseq objects with ranges greater than 1bp. Of course, I understand there may be intra-package consistency reasons for the change, but I figured it was worth a shot to ask because I think there is value in BSseq objects that represent methylation over regions.

This is cross-posted on Bioc support (https://support.bioconductor.org/p/109098/)

Thanks, Raymond

PeteHaitch commented 6 years ago

In principle, that check has been in principle there for several years (https://github.com/hansenlab/bsseq/blame/48bb748b881399aa618245c3934a325818309aa5/R/BSseq_class.R#L123). However, there was a bug in the code that meant it was never triggered! That was fixed in the most recent release (https://github.com/hansenlab/bsseq/commit/57b766fed94aed56f661cbf55510d594cf88810f). It was a slight oversight not to include it in the NEWS.

That said, I think what we might worth considering allowing the BSseq() constructor to have GRanges of arbitrary width (and the validity method not enforcing it). Conversely, it seems worth ensuring in the read.bismark() constructor that the loci have width = 1.

rcavalcante commented 6 years ago

Hmm, that's kind of funny. I guess that was a lucky coincidence that enabled me to transition methylSig to the BSseq class.

I agree that a check in read.bismark() still makes sense. To give a little context, I'm always reading the files in with read.bismark() and only downstream do I do an aggregation wherein I want to manually construct a BSseq.

PeteHaitch commented 6 years ago

FYI I'll need to wait to hear from @kasperdanielhansen before I make this change.

kasperdanielhansen commented 6 years ago

I am open to this in principle. But for sure, some current functions silently assume width is 1. We should identify those - BSmooth() is one of them.

PeteHaitch commented 6 years ago

some current functions silently assume width is 1

Good point. Will check these before altering anything

rcavalcante commented 6 years ago

Good morning,

Just wanted to check in to see if you're still landed in favor of removing the width = 1 requirement. I was planning to submit the package I mentioned above for the next Bioconductor release, but didn't want to start that process until I was sure the build system wouldn't give me problems.

I also don't want you to get the impression that I'm pestering you, just checking in.

Thanks, Raymond

PeteHaitch commented 6 years ago

Hi @rcavalcante,

I'm in favour of this and am adding explicit checks to functions in bsseq that currently implicitly assume all loci have width == 1. It's part of some larger changes to the internals of bsseq, so it isn't currently available in the devel branch. Let me know when you're ready to submit to BioC and I'll port just those changes to the devel branch so that your package builds cleanly.

Cheers, Pete