BSseq object contained 130 out of 600 celltype-specific regions.

Dear Stephanie,

Thank you so much for creating this package. Im applying it on RRBS data in blood samples, coverage is on average around 10-15x. The steps i take involve inputing my data on a bsseq object to make it compatible with the package and then follow the steps that Ben Lauger recommended in his previous issue ( convert to hg18, use 1 bsseq object /file to minimise issues with coverage across different samples) but it seems that only get 130 (or so depending on the sample) out of 600 regions and therefore my estimates don't seem to be accurate. See below a couple of examples. Do you have any other ideas/recommendations on how to improve the efficiency of this? The issue seems to be with underestimating the presence of granulocytes in RRBS as i get very low counts (similar to what you present in your paper when comparing to houseman algorhithm). It could also be just a coverage issue and with RRBS i dont get enough coverage in the relevant areas.

My estimated counts look like:

        Gran      CD4T      CD8T     Bcell         Mono NK

test1 0.06322873 0.3599262 0.4635277 0.1133166 8.151731e-07 0

       Gran      CD4T     CD8T     Bcell        Mono         NK

test1 0.1281388 0.3148228 0.347911 0.1606397 0.007710417 0.04077731

     Gran      CD4T CD8T     Bcell         Mono           NK

test1 0.167842 0.6489066 0 0.1832514 3.066225e-14 1.291246e-14

I have also notived that if i run the same command with the same file more than once i get slightly different results which im not sure why is the case? Is that to be expected or does the fairly low detection of cell-type specific regions impact on that?

Thank you so much in advance for your help, Best wishes, Leo

stephaniehicks / methylCC

BSseq object contained 130 out of 600 celltype-specific regions. #6