hansenlab / bsseq

Devel repository for bsseq
35 stars 25 forks source link

Bsmooth() #140

Closed romansch-hub closed 3 days ago

romansch-hub commented 2 weeks ago

Error in bsseq::BSmooth(freqgpc, ns = 10, h = 100, verbose = TRUE) : BSmooth() encountered errors: 148 of 148 smoothing tasks failed. In addition: Warning message: In parallel::mccollect(wait = TRUE) : 3 parallel jobs did not deliver results


I loaded with read.bismark my input file from nanoNOMe sequencing for GpC accessibility. I tried smoothing with the parameters ns=10 and h=100. I tried smoothing with 500 GB of mem. How can I run the smoothing with these parameters. Note. The smoothing overall works. With the 'default parameters' of ns=70 and h=1000 I get results even with low amount of mem.

PeteHaitch commented 2 weeks ago

AFAICT there are many more GpC than CpG in the genome. How many loci and chromosomes do you have?

BSmooth() was initially developed for CpG methylation and so the default parameters are designed for the spacing of CpGs in the (human/mouse) genome. We have used it for other loci (e.g., methylation of CC) but it required a bit of work to find appropriate parameters and to get it to run required something on the order of 100 - 500 GB RAM.

romansch-hub commented 2 weeks ago

An object of type 'BSseq' with 114074141 methylation loci 1 samples has not been smoothed

Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM

I found couple of papers which use BSmooth for GpC with my parameters. For them it seems to work but not for me unfortunately. Even with the data from the paper they used for smoothing.

PeteHaitch commented 2 weeks ago

I have no experience with GpC data. Have you contacted those people you know have run GpC data through bsseq for advice?

romansch-hub commented 2 weeks ago

is there a difference in the smoothing algorithm between the latest and older versions (like from 3 or 4 years ago)?

PeteHaitch commented 2 weeks ago

Not to my knowledge but @kasperdanielhansen can confirm

romansch-hub commented 3 days ago

As an update. It seems to be a memory problem. By subsetting the input file into e.g. one file per chromosome the smoothing can be performed.