haowulab / DSS

14 stars 13 forks source link

Error when performing statistical test for DML with smoothing #13

Open xiuru opened 4 years ago

xiuru commented 4 years ago

Hello, I am using DSS to detect DML for WGBS data and got an error when performing statistical test for DML with smoothing. My code: dat1.1 = read.table("chr1-ZmBS-BS1-1-CpG.bismark.cov.tsv", header=TRUE) dat1.2 = read.table("chr1-ZmBS-BS2-1-CpG.bismark.cov.tsv", header=TRUE) dat2.1 = read.table("chr1-ZmMC-BS1-1-CpG.bismark.cov.tsv", header=TRUE) dat2.2 = read.table("chr1-ZmMC-BS2-1-CpG.bismark.cov.tsv", header=TRUE) BSobj = makeBSseqData( list(dat1.1, dat1.2, dat2.1, dat2.2),c("BS1","BS2", "MC1", "MC2") ) dmlTest.sm = DMLtest(BSobj, group1=c("BS1", "BS2"), group2=c("MC1", "MC2"),smoothing=TRUE)

But got an error like: Smoothing ... Estimating dispersion for each CpG site, this will take a while ... |======================================================================| 100%

| | 0%Error in result[[njob]] <- value : attempt to select less than one element in OneIndex In addition: Warning message: In parallel::mccollect(wait = FALSE, timeout = 1) : 1 parallel job did not deliver a result

When i try to test the first 20000 lines of those 4 files, DMLtest works fine with no error, it seems something wrong in my original files. Do you have any ideas on how to avoid this?

Thank you in advance!

haowulab commented 4 years ago

I can't tell from this. But there seems to be a warning msg in parallel computing part. Can you try to use single core? Do:

single = MulticoreParam(workers=1, progressbar=TRUE) dmlTest.sm = DMLtest(BSobj, group1=c("BS1", "BS2"), group2=c("MC1", "MC2"),smoothing=TRUE, BPPARAM=single)

xiuru commented 4 years ago

@haowulab Thanks for your suggestion. Single core works well for my data, maybe there are something wrong for my BiocParallel package. I will reinstall BiocParallel and try multi core for DMLtest.

Thanks!

dlabuz commented 3 years ago

I can't tell from this. But there seems to be a warning msg in parallel computing part. Can you try to use single core? Do:

single = MulticoreParam(workers=1, progressbar=TRUE) dmlTest.sm = DMLtest(BSobj, group1=c("BS1", "BS2"), group2=c("MC1", "MC2"),smoothing=TRUE, BPPARAM=single)

I have an issue running DMLtest with more than single core. The progress bar will just stay at 0% for an hour+ when using anything more than a single core. I'm working with human genome and single core takes several hours just comparing 2 samples, when in reality I want to compare several more samples. I've tried re-installing BiocParallel to no avail. I am running R v4.1.0 on ubuntu 20.04.2. Is this a problem specific to ubuntu parallelization? I saw this issue: https://github.com/Bioconductor/BiocParallel/issues/106. I cannot figure out how to troubleshoot for DSS unfortunately. Any thoughts?

haowulab commented 3 years ago

I don't know. Can you can other BiocParallel codes in ubuntu?

realzhang commented 3 years ago

I can't tell from this. But there seems to be a warning msg in parallel computing part. Can you try to use single core? Do: single = MulticoreParam(workers=1, progressbar=TRUE) dmlTest.sm = DMLtest(BSobj, group1=c("BS1", "BS2"), group2=c("MC1", "MC2"),smoothing=TRUE, BPPARAM=single)

I have an issue running DMLtest with more than single core. The progress bar will just stay at 0% for an hour+ when using anything more than a single core. I'm working with human genome and single core takes several hours just comparing 2 samples, when in reality I want to compare several more samples. I've tried re-installing BiocParallel to no avail. I am running R v4.1.0 on ubuntu 20.04.2. Is this a problem specific to ubuntu parallelization? I saw this issue: Bioconductor/BiocParallel#106. I cannot figure out how to troubleshoot for DSS unfortunately. Any thoughts?

It seems that I have the same problem. The progress bar stay 0% for hours, even 50 or 80 threads are running. btw, I am using CentOS 7.

haowulab commented 3 years ago

I can't tell from this. But there seems to be a warning msg in parallel computing part. Can you try to use single core? Do: single = MulticoreParam(workers=1, progressbar=TRUE) dmlTest.sm = DMLtest(BSobj, group1=c("BS1", "BS2"), group2=c("MC1", "MC2"),smoothing=TRUE, BPPARAM=single)

I have an issue running DMLtest with more than single core. The progress bar will just stay at 0% for an hour+ when using anything more than a single core. I'm working with human genome and single core takes several hours just comparing 2 samples, when in reality I want to compare several more samples. I've tried re-installing BiocParallel to no avail. I am running R v4.1.0 on ubuntu 20.04.2. Is this a problem specific to ubuntu parallelization? I saw this issue: Bioconductor/BiocParallel#106. I cannot figure out how to troubleshoot for DSS unfortunately. Any thoughts?

It seems that I have the same problem. The progress bar stay 0% for hours, even 50 or 80 threads are running. btw, I am using CentOS 7.

I really can't tell. Are you using a desktop computer running ubuntu? There might be problems running biocparallel on a hpc cluster with a scheduler such as SGE. Can you run other codes using biocparallel?

adRn-s commented 2 years ago

IDK if this is from upstream (BiocParallel) or not. Yet, I'm experiencing an issue that seems related to this. Using the example code from DMLtest help, I see that multiple core(s) is much slower than a single core on RStudio Server.

> mParam = MulticoreParam(workers=128, progressbar=TRUE)
> timestamp(); dmlTest1 <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2"), BPPARAM=mParam); timestamp()
##------ Wed Mar 30 10:13:40 2022 ------##
Estimating dispersion for each CpG site, this will take a while ...
  |=======================================================| 100%

  |===========================================| 100%

##------ Wed Mar 30 10:46:05 2022 ------##
> timestamp(); dmlTest1 <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2"), BPPARAM=single); timestamp()
##------ Wed Mar 30 10:51:37 2022 ------##
Estimating dispersion for each CpG site, this will take a while ...
  |===========================================| 100%
  |===========================================| 100%
##------ Wed Mar 30 10:51:46 2022 ------##
haowulab commented 2 years ago

To all users experiencing problems with parallel computing:

DSS used to use BiocParallel for parallel computing. However, some recent changes in BiocParallel makes it very slow. I asked on bioc website but nobody replied. You can see my post at https://support.bioconductor.org/p/9140528/ and try the codes there.

I modified DSS to use another package. You can see some description at http://www.bioconductor.org/packages/devel/bioc/vignettes/DSS/inst/doc/DSS.html#331_Parallel_computing_for_DMLDMR_detection_from_two-group_comparison.

The new package is available as “development” version at http://www.bioconductor.org/packages/devel/bioc/html/DSS.html. Bioc has only two releases every year, so the changes won’t appear in the “official” package maybe until summer. Anyway, you can install the devel version and try.

Hao

llrs commented 1 year ago

I commented in the support thread which lead to opening an issue in BiocParallel: https://github.com/Bioconductor/BiocParallel/issues/238 The behaviour might change but the solution using BiocParallel seems to be usingforce.GC=FALSE inside bplapply. Hopefully this will get fixed before the next release as current parallel solution doesn't work in windows.