saezlab / CARNIVAL

CAusal Reasoning for Network Identification with integer VALue programming in R
https://saezlab.github.io/CARNIVAL/
57 stars 29 forks source link

Running carnival in parallel #61

Closed ahmedasadik closed 2 years ago

ahmedasadik commented 3 years ago

Dear Carnival Developers, First, thank you for the great tool, it really is amazing.

My issue, now, is trying to run Carnival in parallel, but that is not working. I made a toy example to run it but it is not working.

library(furrr)
plan(multisession, workers = 4)
library(CARNIVAL)

load(file = system.file("toy_inputs_ex1.RData",
                        package="CARNIVAL"))
load(file = system.file("toy_measurements_ex1.RData",
                        package="CARNIVAL"))
load(file = system.file("toy_network_ex1.RData",
                        package="CARNIVAL"))

input_lists <- rep(list(toy_inputs_ex1),times=20)
meas_lists <- rep(list(toy_measurements_ex1),times=20)
net_lists <- rep(list(toy_inputs_ex1),times=20)
input_lists <- rep(list(toy_inputs_ex1),times=20)

result = future_map(input_lists, function(lst){
  runCARNIVAL(inputObj = lst, measObj = toy_measurements_ex1, threads =1,
              netObj = toy_network_ex1, solverPath = "/usr/bin/cbc", solver = "cbc")
})

This is the error that I get when running it.

Writing constraints...
Solving LP problem...
Error: 'results_cbc_1_1.txt' does not exist in current working directory ('/home/ahmed/Comp_Bio/Projects/Discovery_Pipeline').
In addition: Warning message:
UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore".

I looked at the one of the development branches, and it seems that this was somehow addressed with parallelIdx1, but it always has a fixed value of 1 and there is no argument in the runCarnival function to change 'condition' that would then send off different arguments to the cbc solver instead of always looking for the 'results_cbc_1_1.txt' file which is already in use by the first node.

Thanks,

ivanovaos commented 3 years ago

Hi @ahmedasadik, Thanks for writing to us. The newer version of CARNIVAL (will be submitted this week) won't support multithreading by itself, but if you use e.g. cplex solver, it natively supports multithreading and we would rely on this. How big is the problem that you want to solve with CARNIVAL?

ahmedasadik commented 3 years ago

I have many single-cell and bulk expression datasets that I need to use carnival for. So doing things in parallel is extremely important. Unfortunately, I don't have access to cplex and they refused an academic license because my institute is not a university.

ivanovaos commented 3 years ago

At the current implementation that you use, the easiest way to handle it is sending each sample to a separate cluster node (through .sh or snakemake scripts). Just be sure that you setup a different working directory for each run, so the files won't be accidentally rewritten. We are working on making default pipelining for running CARNIVAL on many samples simultaneously, but this will be public only in a couple of months.

ahmedasadik commented 3 years ago

OK, but would it be possible to pass a "threads" and "randomseed" options to CBC, by modifying the carnivaloptions sent to the CBC command line? That way it would be much faster than it curretly is, especially that I built my CBC solver by enabling multithreading. Otherwise, I would appreciate if you could tell me how to export the LP file send to the solver and then I can bash that in parallel. I would appreciate your help very much.

ivanovaos commented 3 years ago

If you wait for until the end of the next week, we can add this option to cbc indeed. We are currently wrapping up the next bioconductor release, to add another option for a solver won't be an issue. Also, in the new release it will be easy to save and collect the LP files. Can you make a new issue with suggesting options for cbc? I will later add a branch to it and you will get notified when it is done.

ahmedasadik commented 3 years ago

Thank you very much. I really appreciate it.

gabora commented 2 years ago

see #62