saeyslab / nichenetr

NicheNet: predict active ligand-target links between interacting cells
467 stars 116 forks source link

Parallelization error when optimizing parameters for NicheNet #292

Open tkapello opened 1 month ago

tkapello commented 1 month ago

Hi,

I am implementing the Omnipath resources for NicheNet analysis on a Windows computer. I was optimizing the parameters for the model as shown below:

# Parameter optimization
expression_settings_validation<-readRDS(url("https://zenodo.org/record/3260758/files/expression_settings.rds"))

my_source_weights_df<-tibble(source=unique(c(lr_network$source, sig_network$source, gr_network$source)), weight=rep(1, length(unique(c(lr_network$source, sig_network$source, gr_network$source)))))

additional_arguments_topology_correction<-list(source_names=my_source_weights_df$source %>% unique(), 
                                               algorithm="PPR", 
                                               correct_topology=FALSE,
                                               lr_network=lr_network, 
                                               sig_network=sig_network, 
                                               gr_network=gr_network, 
                                               settings=lapply(expression_settings_validation, convert_expression_settings_evaluation),
                                               secondary_targets=FALSE, 
                                               remove_direct_links="no", 
                                               cutoff_method="quantile")

nr_datasources<-additional_arguments_topology_correction$source_names %>% length()

obj_fun_multi_topology_correction<-makeMultiObjectiveFunction(name="nichenet_optimization",
                                                              description="data source weight and hyperparameter optimization: expensive black-box function",
                                                              fn=model_evaluation_optimization, 
                                                              par.set=makeParamSet(makeNumericVectorParam("source_weights", len=nr_datasources, lower=0, upper=1, tunable=FALSE),
                                                                                   makeNumericVectorParam("lr_sig_hub", len=1, lower=0, upper=1, tunable=TRUE),  
                                                                                   makeNumericVectorParam("gr_hub", len=1, lower=0, upper=1, tunable=TRUE),  
                                                                                   makeNumericVectorParam("ltf_cutoff", len=1, lower=0.9, upper=0.999, tunable=TRUE), 
                                                                                   makeNumericVectorParam("damping_factor", len=1, lower=0.01, upper=0.99, tunable=TRUE)), 
                                                              has.simple.signature=FALSE,
                                                              n.objectives=4, 
                                                              noisy=FALSE,
                                                              minimize=c(FALSE, FALSE, FALSE, FALSE))

optimization_results=lapply(1, mlrmbo_optimization, obj_fun=obj_fun_multi_topology_correction, niter=8, ncores=5, nstart=1250, additional_arguments=additional_arguments_topology_correction)

However, I have an error:

Error in parallelStart(mode = MODE_MULTICORE, cpus = cpus, level = level,  : 
  Multicore mode not supported on windows!

I understand that the issue is the Windows OS which can not use several cores to process functions but I have not found a workaround with parallelStart(mode='socket', cpus=5) with no success. I would appreciate any support!

csangara commented 4 weeks ago

Hi,

From NicheNet v2 onwards we started using nsga2R optimization instead of mlrMBO, as it is much faster. Unfortunately this means I am not able to help you with this particular issue, as the code in question is 5+ years old at this point, and it seems the parallelMap package we used for parallelization has been deprecated for 4 years now.

I would recommend you to switch to the nsga2r functions we have...but the optimization would still take multiple days to run, so I'm not sure it will be feasible on a personal computer.

Thanks for trying out the optimization though, it's the first issue we've gotten about this 😄 It's probably going to be very tricky to make it run on a different system, so let me know if you run into more problems.

Best regards, Chananchida

tkapello commented 2 weeks ago

Hi @csangara,

just a clarification. I was going to run the optimization steps on a desktop computer with 64 GB RAM. In the tutorial, I see that you used a HPC which ran for a few days. Unfortunately, I do not have that possibility. So I was wondering how critical this step really is (even using nsga2) for running NicheNet with an updated set of ligand-receptor pairs from the Omnipath database. In other words, could I use the merged ligand-target pairs from Omnipath & NicheNet without the optimization steps?

Thanks in advance, Theo

csangara commented 2 weeks ago

Hi Theo,

The optimization step can slightly improve the performance of NicheNet target gene prediction but I would not say it is very critical (see Supplementary Notes Figure 2.3 and 2.4 in the MultiNicheNet paper). So you can definitely use updated LR pairs from Omnipath without optimization. However, when reconstructing the ligand-target matrix, I would recommend taking a similar weight as the original Omnipath LR data source (named "omnipath" with score of ~0.16, see also Supplementary Table 1B,E).

tkapello commented 1 week ago

Thank you @csangara,

last question! I understand the purpose of giving weights to different databases, e.g. Omnipath. When I combine NicheNet and Omnipath, I have > 1,000 sources, some of which are in the Table you referred me to. However, I was thinking that if I give special weights to only those sources (about 60 in number), I might be biasing my downstream predictions. What do you think? Would you still recommend weighting a few databases or keep the same default weight (i.e. 1) in all of them?