Open marcinschmidt opened 1 year ago
mclapply
just returns a list, so combining is just c(fit1, fit2, fit3)
. The vignette outlines additional steps to extract and work with individual components of the objects returned by dmn
.
I wonder how big your data is? Also I wonder if the long running time is due to the size of the data or some other limitation, e.g., memory use.
Also is there something to do upstream to make the data smaller, e.g., some kind of dimensional reduction before doing the 'full' analysis; I have not worked in this space for a while so don't know if that is a good idea or not.
Hi! I run my data in chunks of [189, 8693], [191, 8693], and [197, 8693]. The server I used lately analysed with
benchmarkme::get_ram()
returns
201 GB
and
parallel::detectCores()
returns
48
for
plot(benchmarkme::benchmark_std())

gives
You are ranked 192 out of 749 machines.

You are ranked 419 out of 747 machines.

You are ranked 392 out of 747 machines.
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Does it mean for you anything specific? I'm biologist... and that is the most powerful machine I can use. Probably the dimensional reduction might be a solution. I will give it a try. When I submit my job to queue (SLURM) I use:
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=4
#SBATCH --mem=38gb
#SBATCH --time=6-23:59:00 # total run time limit (HH:MM:SS)
I might try increasing number of nodes and mem to 128 or even 256 GB but time limit is 7 days anyway. Let me know if you have any idea. Best regards, Marcin
I've got quite a large dataset I want to analyse with
dmn
. Running it withfit <- mclapply(1:20, dmn, count=count, verbose=TRUE)
on my desktop did not complete within 30 days (using all 4 cores). Probably power outage cancelled calculations as the system was reloaded. I divided the dataset into parts and run it also on a server. Some parts were finished but there is a 7-days limit and some needed more time. I would prefer to run the data as a full dataset.Can I replace
fit <- mclapply(1:20, dmn, count=count, verbose=TRUE)
withHow to combine
fit1
(1:7),fit2
(8:14), andfit3
(15:20) intofit
(1:20) ?