Closed mapostolides closed 3 years ago
Hi Michael,
I'm following along from the vignette on Bioconductor. I think the part that's missing from your code is here. We load in a different test dataset than what was used in the previous vignette sections.
# Load data for use in the rest of the vignette
data(BS.cancer.ex, package = 'bsseqData')
bs = BS.cancer.ex[1:10000]
bs = filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
In particular, now there is a Type
column in the colData
of the bs
object:
> pData(bs)
DataFrame with 6 rows and 2 columns
Type Pair
<character> <character>
C1 cancer pair1
C2 cancer pair2
C3 cancer pair3
N1 normal pair1
N2 normal pair2
N3 normal pair3
I apologize if that was unclear, and thanks for your interest in methylSig.
Raymond
Thanks Raymond !
I did indeed miss those lines. I've added them, and now I see the following:
pData(bs)
DataFrame with 6 rows and 2 columns Type Pair <character> <character> C1 cancer pair1 C2 cancer pair2 C3 cancer pair3 N1 normal pair1 N2 normal pair2 N3 normal pair3
It seems that the BSSeq object is being taken from a pre-loaded test dataset BS.cancer.ex
. It seems this meta-data of Type is required for methylSig to work.
files = c(
system.file('extdata', 'bis_cov1.cov', package='methylSig'),
system.file('extdata', 'bis_cov2.cov', package='methylSig')
)
When I check their contents, this is what I see:
pData(bs)
DataFrame with 2 rows and 0 columns
How might I add this metadata and get methylSig working when starting with files instead of a pre-loaded dataset?
Thanks!
Hi Michael,
Let's begin at the top. For the purposes of demonstrating the bsseq::read.bismark
function in the vignette, I use small data that doesn't have any associated phenotype data. It doesn't need any phenotype data because I'm not going to use it for any of the other functions in the package. To demonstrate the other functions I use the BS.cancer.ex
data that came prepackaged. So here is that small data:
files = c(
system.file('extdata', 'bis_cov1.cov', package='methylSig'),
system.file('extdata', 'bis_cov2.cov', package='methylSig')
)
bsseq_stranded = bsseq::read.bismark(
files = files,
colData = data.frame(row.names = c('test1','test2')),
rmZeroCov = FALSE,
strandCollapse = FALSE
)
If one were to do pData(bsseq_stranded)
, you would see:
DataFrame with 2 rows and 0 columns
This is because in the bsseq::read.bismark
call I assigned
colData = data.frame(row.names = c('test1','test2'))
This is an empty data.frame. To assign phenotype data to this example we could just as easily have said:
files = c(
system.file('extdata', 'bis_cov1.cov', package='methylSig'),
system.file('extdata', 'bis_cov2.cov', package='methylSig')
)
bsseq_stranded = bsseq::read.bismark(
files = files,
colData = data.frame(condition = c('case', 'control'), sex = c('Male', 'Female'), row.names = c('test1','test2')),
rmZeroCov = FALSE,
strandCollapse = FALSE
)
Then
> pData(bsseq_stranded)
DataFrame with 2 rows and 2 columns
condition sex
<character> <character>
test1 case Male
test2 control Female
How might I add this metadata and get methylSig working when starting with files instead of a pre-loaded dataset?
You provide a data.frame
to the colData
parameter of bsseq::read.bismark
that describes the files you're reading. Please read the documentation by doing ?bsseq::read.bismark
. The rows of colData
correspond to the columns of the BSseq
object and its matrices (down to the names). You can then use the column names in pData(object)
to tell the differential methylation test functions how to group the data. In the BS.cancer.ex
example, Type
was particular to that data, but every dataset may be different.
Hope that helps, Raymond
Thanks Raymond, that's helpful! I really appreciate the thorough reply.
Hi Raymond,
I now have my code set up as follows:
system.file('extdata', 'bis_cov1.cov', package='methylSig'),
system.file('extdata', 'bis_cov2.cov', package='methylSig')
)
bs = bsseq::read.bismark(
files = files,
colData = data.frame(condition = c('case', 'control'), sex = c('Male', 'Female'), row.names = c('test1','test2')),
rmZeroCov = FALSE,
strandCollapse = FALSE
)
bs = filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
windowed_bs = tile_by_windows(bs = bs, win_size = 10000)
pData(windowed_bs)
diff_gr = diff_methylsig(
bs = windowed_bs,
group_column = 'condition',
comparison_groups = c('case' = 'case', 'control' = 'control'),
disp_groups = c('case' = TRUE, 'control' = TRUE),
local_window_size = 0,
t_approx = TRUE,
n_cores = 1)
As expected, my metadata is now as follows:
DataFrame with 2 rows and 2 columns
condition sex
<character> <character>
test1 case Male
test2 control Female
However, now I am getting a different error when running the diff_methylsig function:
Error in base::rowSums(meth_mat[, case_idx]) : 'x' must be an array of at least two dimensions
Any idea why this might be happening?
Thanks!
Hi Michael,
Again, the data you're using is not intended to be used for differential methylation testing. I made up the conditions to illustrate to you how you'd add phenotype data to your own data. Moreover, that data doesn't have replicates per groups, which will cause the internal calculations of the test for differential methylation to fail.
Please use the BS.cancer.ex
if you want to try out the tests for differential methylation. That's what that dataset was intended for.
Raymond
Hello,
I have tried running the following code which is directly from the readme on this github
However, I am getting the following error upon running the
diff_methylsig
function:I am not sure what this "group_column" is or what "Type" is, as there seems to be no existing "Type" column in the data. Am I missing something?
Thanks, Michael