Closed romansch-hub closed 3 days ago
AFAICT there are many more GpC than CpG in the genome. How many loci and chromosomes do you have?
BSmooth()
was initially developed for CpG methylation and so the default parameters are designed for the spacing of CpGs in the (human/mouse) genome.
We have used it for other loci (e.g., methylation of CC) but it required a bit of work to find appropriate parameters and to get it to run required something on the order of 100 - 500 GB RAM.
An object of type 'BSseq' with 114074141 methylation loci 1 samples has not been smoothed
I found couple of papers which use BSmooth for GpC with my parameters. For them it seems to work but not for me unfortunately. Even with the data from the paper they used for smoothing.
I have no experience with GpC data. Have you contacted those people you know have run GpC data through bsseq for advice?
is there a difference in the smoothing algorithm between the latest and older versions (like from 3 or 4 years ago)?
Not to my knowledge but @kasperdanielhansen can confirm
As an update. It seems to be a memory problem. By subsetting the input file into e.g. one file per chromosome the smoothing can be performed.
Error in bsseq::BSmooth(freqgpc, ns = 10, h = 100, verbose = TRUE) : BSmooth() encountered errors: 148 of 148 smoothing tasks failed. In addition: Warning message: In parallel::mccollect(wait = TRUE) : 3 parallel jobs did not deliver results
I loaded with read.bismark my input file from nanoNOMe sequencing for GpC accessibility. I tried smoothing with the parameters ns=10 and h=100. I tried smoothing with 500 GB of mem. How can I run the smoothing with these parameters. Note. The smoothing overall works. With the 'default parameters' of ns=70 and h=1000 I get results even with low amount of mem.