saezlab / cosmosR

COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.
https://saezlab.github.io/cosmosR/
GNU General Public License v3.0
56 stars 15 forks source link

Problem running cosmos with toy datasets and cbc solver #15

Closed sarahbonnin closed 2 years ago

sarahbonnin commented 2 years ago

Hello,

I am testing cosmosR using the toy dataset and the cbc solver (I can't use cplex for the moment...):

# set CARNIVAL options
  CARNIVAL_options <- cosmosR::default_CARNIVAL_options()
  CARNIVAL_options$solver <- "cbc"
  CARNIVAL_options$solverPath <- "{path-to-cbc}/dist/bin/cbc"
  CARNIVAL_options$timelimit <- 3600
  CARNIVAL_options$mipGAP <- 0.05
  CARNIVAL_options$threads <- 2

  # load test data
  data(toy_network)
  data(toy_signaling_input)
  data(toy_metabolic_input)
  data(toy_RNA)

# forward pre-processing
prep_cosmos_for <- preprocess_COSMOS_signaling_to_metabolism(meta_network = toy_network,
                                                             signaling_data = toy_signaling_input,
                                                             diff_expression_data = toy_RNA,
                                                              metabolic_data = toy_metabolic_input,
                                                              maximum_network_depth = 15,
                                                              remove_unexpressed_nodes = TRUE,
                                                              CARNIVAL_options=CARNIVAL_options )

Here is the log and the error that I get in the end (I manually changed the full path to {path-to-wd} as you could guess):

'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
[1] "COSMOS: all 3 signaling nodes from data were found in the meta PKN"
[1] "COSMOS: all 3 metabolic nodes from data were found in the meta PKN"
[1] "COSMOS: 4660 of the 15919 genes in expression data were found as transcription factor target"
[1] "COSMOS: 4660 of the 5312 transcription factor targets were found in expression data"
[1] "COSMOS: removing unexpressed nodes from PKN..."
[1] "COSMOS: 0 interactions removed"
[1] "COSMOS: removing nodes that are not reachable from inputs within 15 steps"
[1] "COSMOS: 86 from  115 interactions are removed from the PKN"
[1] "COSMOS: 1 input/measured nodes are not in PKN any more: XMetab__439155___c____ and 0 more."
[1] "COSMOS: removing nodes that are not observable by measurements within 15 steps"
[1] "COSMOS: 10 from  29 interactions are removed from the PKN"
[1] "COSMOS: 1 input/measured nodes are not in PKN any more: X1445 and 0 more."
[1] "COSMOS:  0 interactions are removed from the PKN based on consistency check between TF activity and gene expression"
[1] "COSMOS wasn't tested thoroughly with the cbc solver. We recommend the users to use CPLEX if possible, and use cbc as a backup solution."
[1] "COSMOS wasn't tested thoroughly with the cbc solver. We recommend the users to use CPLEX if possible, and use cbc as a backup solution."
--- Start of the CARNIVAL pipeline ---
10:38:45 09.02.2022 Carnival flavour: vanilla
10:38:45 09.02.2022 Generating variables for lp problem
10:38:45 09.02.2022 Done: generating variables for lp problem
Saving preprocessed data.
Done: saving parsed data: {path-to-wd}//parsedData_t10_38_45d09_02_2022n85.RData
10:38:45 09.02.2022 Generating formulation for LP problem
10:38:46 09.02.2022 Done: generating formulation for LP problem.
Saving LP file
Done: Saving LP file: {path-to-wd}//lpFile_t10_38_45d09_02_2022n85.lp
10:38:46 09.02.2022 Solving LP problem
Welcome to the CBC MILP Solver
Version: Devel (unstable)
Build Date: Feb  9 2022
command line - {path-to-wd}//lpFile_t10_38_45d09_02_2022n85.lp -seconds 3600 -ratio 0.0001 solve printi csv solu {path-to-wd}//result_t10_38_45d09_02_2022n85.txt (default strategy 1)
seconds was changed from 1e+08 to 3600
ratioGap was changed from 0 to 0.0001
Unable to open file ./solve
Unable to open file ./printi
Unable to open file ./csv
Unable to open file ./solu
Unable to open file {path-to-wd}//result_t10_38_45d09_02_2022n85.txt
Total time (CPU seconds):       0.000283   (Wallclock seconds):       0.000993013
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '{path-to-wd}//result_t10_38_45d09_02_2022n85.txt': No such file or directory

There is no issue with writing permissions in {path-to-wd} as both parsedData_t10_38_45d09_02_2022n85.RData and lpFile_t10_38_45d09_02_2022n85.lp were indeed created, but result_t10_38_45d09_02_2022n85.txt is not there...

I know the cbc solver was not fully tested, but any chance you could help me out with this issue?

Thank you! Best wishes, Sarah

sarahbonnin commented 2 years ago

Hi, Could someone help me out with this issue?

adugourd commented 2 years ago

Hi Sarah,

Thanks for reporting this issue. It is due to a bug that we are currently working to adress in the latest version of Carnival.

In the meantime, you can install a previous stable version of carnival which should solve the problem.

Run this ins R:

remotes::install_github("saezlab/CARNIVAL@b3a84c6ba9706547caca02644566d75ee621f568")

Cheers,

Aurelien Dugourd

sarahbonnin commented 2 years ago

Hi Aurelien,

Thank you! Installing that version of carnival solved my first issue, although I get new errors. I think they are errors from cbc (see here and here), but I don't really understand if they may be caused by my input in some way:

Writing constraints...
Solving LP problem...
Welcome to the CBC MILP Solver
Version: Devel (unstable)
Build Date: Feb  9 2022
command line - testFile_1_1.lp -seconds 3600 -ratio 0.0001 solve printi csv solu results_cbc_1_1.txt (default strategy 1)

then I get a lot of:

### CoinLpIO::readLp(): Variable dist_X139741 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X26154 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X154664 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X23461 does not appear in objective function or constraints
...

and finally:

### ERROR: 526 duplicates in objective and matrix
ERROR: CoinLpIO::readLp, ### ERROR: 526 duplicates in objective and matrix

There were -1 errors on input
seconds was changed from 1e+08 to 3600
ratioGap was changed from 0 to 0.0001
Unable to open file ./solve
Unable to open file ./printi
Unable to open file ./csv
Unable to open file ./solu
Unable to open file ./results_cbc_1_1.txt
Total time (CPU seconds):       1.43682   (Wallclock seconds):       1.56906
Error: 'results_cbc_1_1.txt' does not exist in current working directory ('..../cosmos_tests').

Are you familiar with these errors?

Thank you!

gabora commented 2 years ago

Dear Sarah,

I just got the same error during a test. In my case, I had a self-activating loop in the prior knowledge network (GNAS -> GNAS), which interfered with the ILP formulation. Could you please check if you have a self activating loop in your PKN? for example: which(pkn_network$source == pkn_network$target)

In details, CARNIVAL generated from the self activation loop a constraint, like 101 eU6115 + nDs750 - nDs750 <= 100, where the variable nDs750 appeared multiple times within the same constraint. This triggers an error in CbC.

CPLEX and Gurobi both managed to solve the same LP file and only Gurobi mentioned a warning about this problem. In the (very) near future we will add a few lines to remove self-loops automatically.

thanks, Attila

gabora commented 2 years ago

Could you also please confirm that the second error is not related to toy_network data?

sarahbonnin commented 2 years ago

Hi Attila,

Thank you!

I indeed had some self-activating loops in my PKN, which I now removed them, (which(pkn_network$source == pkn_network$target) gives me integer(0) ), but I still get the same errors...

 prep_cosmos_for <- preprocess_COSMOS_signaling_to_metabolism(meta_network = network4cosmos, 
                                                               signaling_data = sources_mut,
                                                               metabolic_data = targets_rna,
                                                               maximum_network_depth = 15,
                                                               CARNIVAL_options=CARNIVAL_options)

  run_cosmos_for <- run_COSMOS_signaling_to_metabolism(prep_cosmos_for,
                                                       CARNIVAL_options=CARNIVAL_options)

### CoinLpIO::readLp(): Variable dist_X9313 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X5652 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X51763 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X2873 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X79646 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X7295 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X4255 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X414899 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X55789 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X929 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X57001 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X6251 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X6727 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X57215 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X90 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X10598 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X8718 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X26254 does not appear in objective function or constraints
### CoinLpIO::readLp(): Variable dist_X4700 does not appear in objective function or constraints
### CoinLpIO::is_invalid_name(): Name   contains illegal character ' '
### CoinLpIO::is_invalid_name(): Name c579991: contains illegal character ':'
### CoinLpIO::is_invalid_name(): Name c583077: contains illegal character ':'
### CoinLpIO::is_invalid_name(): Name c583225: contains illegal character ':'
### CoinLpIO::readLp(): Invalid column names
Now using default column names.
seconds was changed from 1e+08 to 3600
ratioGap was changed from 0 to 0.0001
Unable to open file ./solve
Unable to open file ./printi
Unable to open file ./csv
Unable to open file ./solu
Unable to open file ./results_cbc_1_1.txt
Total time (CPU seconds):       1.51339   (Wallclock seconds):       1.7128
Error: 'results_cbc_1_1.txt' does not exist in current working directory 

I now tried with the toy dataset, and I get the same error (still using CBC):

CARNIVAL_options <- cosmosR::default_CARNIVAL_options()
CARNIVAL_options$solver <- "cbc"
CARNIVAL_options$solverPath <- "~/Software/cbc_solver/dist/bin/cbc"
CARNIVAL_options$timelimit <- 3600
CARNIVAL_options$mipGAP <- 0.05
CARNIVAL_options$threads <- 2

data(toy_network)
data(toy_signaling_input)
data(toy_metabolic_input)
data(toy_RNA)

prep_cosmos_for <- preprocess_COSMOS_signaling_to_metabolism(meta_network = toy_network,
                                                               signaling_data = toy_signaling_input, 
                                                               diff_expression_data = toy_RNA, 
                                                               metabolic_data = toy_metabolic_input, 
                                                               maximum_network_depth = 15,
                                                               remove_unexpressed_nodes = TRUE,
                                                               CARNIVAL_options=CARNIVAL_options )

It may look more like a problem with CBC itself.

The difference between running it with my data or with the toy data is that, for the toy data, I get the error already at the preprocess_COSMOS_metabolism_to_signaling step, while with my data I get it at the run_COSMOS_signaling_to_metabolism step.

When I remove options diff_expression_data = toy_RNA and remove_unexpressed_nodes = TRUE at the toy run, then I get the error at the run_COSMOS_signaling_to_metabolism step only.

Cheers

gabora commented 2 years ago

Dear Sarah,

In the last 2 weeks I went through carnival and cosmos and made sure they work well together. So may I ask you to install the current versions of COSMOS and CARNIVAL? remotes::install_github("saezlab/CARNIVAL") and remotes::install_github("saezlab/cosmosR").

Because of the updates, I had to change a bit the options, so please use cosmosR::default_CARNIVAL_options(solver="cbc") in your code. This will generate the default settings for the CBC solver (earlier the options of different solvers were all stored together, which was confusing).

Since in our experience CBC is not as good as CPLEX, I would recommend to reduce the maximum_network_depth = 15 parameter to 4-8 when using CBC. Any node which is not reachable from the layers in maximum_network_depth steps, will be removed. This can strongly reduce the PKN and therefore help the optimizer. I would only increase this maximum_network_depth if the optimization could not connect the nodes from inputs to outputs.

I close this issue, but in case you encounter further problems, please let us know by reopening or submitting another one. thanks! Attila