kdkorthauer / dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing
MIT License
54 stars 14 forks source link

Regarding CPU usage: it will fill up the entire server’s CPU. #66

Closed xuzhe010 closed 2 months ago

xuzhe010 commented 2 months ago

When I use dmrseq, I set BPPARAM = BiocParallel::MulticoreParam(workers = 2), but when I actually run it, this task occupies all the CPU of my server. How can I solve this problem?

My code is as follows:

regions <- dmrseq::dmrseq(bs = bs.filtered,
                          cutoff = cutoff,
                          minNumRegion = minCpGs,
                          maxPerms = maxPerms,
                          testCovariate = testCovariate,
                          adjustCovariate = adjustCovariate,
                          matchCovariate = matchCovariate,
                          BPPARAM = BiocParallel::MulticoreParam(workers = 2)
)

The task memory usage is as follows: 737412d1f95a0fd0987c1ec8cd8a305

kdkorthauer commented 2 months ago

Hi,

Are you sure there are no other R processes running on this particular server? Could you provide the output of dmrseq as the job is beginning (should include a statement about how many cores will be used)? Thanks!

xuzhe010 commented 2 months ago

I made sure I only ran this one task on this server. I checked the log file and it said that using 4 workers/cores. Currently, my server does have 4 workers running in parallel, but each work does not occupy one CPU, but a quarter of the total CPU.

The log file is as shown below: be690fe85483a5ba86becfb8a8c25be

kdkorthauer commented 2 months ago

The output is unexpected given that you've specified MulticoreParam(workers = 2). Could you try running the command BiocParallel::MulticoreParam(workers = 2) in a fresh R session on this server and check whether the output includes bpnworkers: 2 in the first line of the output? If that looks as expected, does the call to dmrseq as you included in your first post still produce output indicating 4 workers?

xuzhe010 commented 2 months ago

I ran two tasks (one was chr1 and the other was chr2) on the same server. As a result, the two tasks did indeed create 2 parallel tasks each, instead of 4. But the problem still exists, these two tasks equally share all the CPU and memory of my server.

The log file of one of the tasks: cc9078c158d71933dda544fb4e21247

My server’s memory and CPU usage: b9d04d127fc27fb5dd8172a33f68990

In my task, one task takes up multiple CPUs. How can I solve this problem? Because I will use the cluster to run my tasks, if this problem is not solved, I will occupy certain nodes of the cluster for a long time, making it impossible to carry out other tasks.

kdkorthauer commented 2 months ago

This looks to be working as expected. Each of your two tasks is using 2 CPUs, for a total of 4, which is what you specified. Note that parallelization using BiocParallel::MulticoreParam indeed generate forked processes with shared memory - see here to learn more.

xuzhe010 commented 2 months ago

I have solved my problem so I decided to close this question.