saezlab / transcriptutorial

This is a tutorial to guide the analysis of RNAseq dataset using footprint based tools such as DOROTHEA, PROGENY and CARNIVAL
https://saezlab.github.io/transcriptutorial/
GNU General Public License v3.0
55 stars 30 forks source link

Long running time of the sample_resolution_carnival.R script for producing carnival_sample_resolution.rds file from my own data #7

Closed sqianglan closed 3 years ago

sqianglan commented 3 years ago

Hi, Thanks very much for the nice tutorial. I got some problem when running the pipeline with my own data. It seems the tutorial (script 1-6) did not include the sampleresolution*.R script inside. So I could not get the carnival_sample_resolution.rds file from my own data, which is needed in the 06_analysis_CARNICAL_results.Rmd.

So I tried to run the sampleresolution*.R script myself, it went smoothly with progeny and Dorothea. However, the sample_resolution_carnival.R took more than 10 hours and still running now. Is that normal? And any possible way to improve it?

Thanks very much

rosherbal commented 3 years ago

Hi @sqianglan, Thanks for your comment. Which solver are you using? If you are running the scripts with the data that are provided in the tutorial, and CPLEX as solver, CARNIVAL shouldn't be running that longer.

sqianglan commented 3 years ago

Thanks @rosherbal . I am using CPLEX, but with my own data. I think possibly because my data is too big. I have removed all the unnecessary data, but still, 2 groups with 5 replicate for each. So altogether 10 samples. Now it needs around 3-4 hours. Is it normal?

adugourd commented 3 years ago

The running time seems normal. You may want to check the gap values in the output of CPLEx to make sure that your solutions are good. It should looks something like this (the % in last column) :

Elapsed time = 1870.78 sec. (188174.13 ticks, tree = 7834.17 MB, solutions = 2) Nodefile size = 5789.07 MB (2789.08 MB after compression) 24817 15120 34.9355 103 36.0466 34.1800 714074 5.18% 25234 15581 34.3808 105 36.0466 34.1800 737318 5.18% 25616 15668 36.0466 175 36.0466 34.1800 740132 5.18% 25827 15810 35.3804 96 36.0466 34.1800 745366 5.18% 25999 16133 35.0759 97 36.0466 34.1800 752452 5.18% 26193 16180 34.4606 110 36.0466 34.1800 753469 5.18% 26357 16290 36.0466 43 36.0466 34.1800 757081 5.18% 26474 16429 34.7933 132 36.0466 34.1800 760421 5.18% 26733 16787 35.8466 49 36.0466 34.1800 768228 5.18% 27013 16938 34.8861 95 36.0466 34.1800 770549 5.18%

sqianglan commented 3 years ago

Thanks. What would be a good gap value?

gabora commented 3 years ago

Hi @sqianglan , 0% mipgap value is favourable in all the cases. This means that the optimisation reached a global solution. It can take some time, even several hours. After 0% is reached, CARNIVAL/CPLEX enumerates alternative solutions, which requires even more time. Usually --due to time constraints-- we are happy if the gap tolerance is around 1-2%. That means that the global minimum is no further than 1-2% in terms of the objective function. If you would like to know a bit more about MIPGap tolerance, I would suggest this website or there is a general intro about this kind of optimisation problems here. best Attila