Thank you so much for creating this package. Im applying it on RRBS data in blood samples, coverage is on average around 10-15x. The steps i take involve inputing my data on a bsseq object to make it compatible with the package and then follow the steps that Ben Lauger recommended in his previous issue ( convert to hg18, use 1 bsseq object /file to minimise issues with coverage across different samples) but it seems that only get 130 (or so depending on the sample) out of 600 regions and therefore my estimates don't seem to be accurate. See below a couple of examples. Do you have any other ideas/recommendations on how to improve the efficiency of this? The issue seems to be with underestimating the presence of granulocytes in RRBS as i get very low counts (similar to what you present in your paper when comparing to houseman algorhithm). It could also be just a coverage issue and with RRBS i dont get enough coverage in the relevant areas.
I have also notived that if i run the same command with the same file more than once i get slightly different results which im not sure why is the case? Is that to be expected or does the fairly low detection of cell-type specific regions impact on that?
Thank you so much in advance for your help,
Best wishes, Leo
I am not actively working in this area anymore. However, my guess is that you have very few CpGs markers for the rare cell type. This can have a big impact. Thanks!
Dear Stephanie,
Thank you so much for creating this package. Im applying it on RRBS data in blood samples, coverage is on average around 10-15x. The steps i take involve inputing my data on a bsseq object to make it compatible with the package and then follow the steps that Ben Lauger recommended in his previous issue ( convert to hg18, use 1 bsseq object /file to minimise issues with coverage across different samples) but it seems that only get 130 (or so depending on the sample) out of 600 regions and therefore my estimates don't seem to be accurate. See below a couple of examples. Do you have any other ideas/recommendations on how to improve the efficiency of this? The issue seems to be with underestimating the presence of granulocytes in RRBS as i get very low counts (similar to what you present in your paper when comparing to houseman algorhithm). It could also be just a coverage issue and with RRBS i dont get enough coverage in the relevant areas.
My estimated counts look like:
test1 0.06322873 0.3599262 0.4635277 0.1133166 8.151731e-07 0
test1 0.1281388 0.3148228 0.347911 0.1606397 0.007710417 0.04077731
test1 0.167842 0.6489066 0 0.1832514 3.066225e-14 1.291246e-14
I have also notived that if i run the same command with the same file more than once i get slightly different results which im not sure why is the case? Is that to be expected or does the fairly low detection of cell-type specific regions impact on that?
Thank you so much in advance for your help, Best wishes, Leo