saezlab / CARNIVAL

CAusal Reasoning for Network Identification with integer VALue programming in R
https://saezlab.github.io/CARNIVAL/
57 stars 29 forks source link

Details about output #53

Closed asumann closed 3 years ago

asumann commented 3 years ago

Hi,

I really liked trying CARNIVAL for my project! For more advanced network visualization, I need to know details of the output list. Could you please provide that information especially for weightedSIF and nodesAttributes? In specific, I have only "T" and NA node type. T is equal to 100 AvgAct, while NA node type could be -100 or 0. What would that mean?

enio23 commented 3 years ago

Hi Ansuman,

A 100 AvgAct means that a node has an average activity of 100, i.e. if you have retrieved 100 solutions from CARNIVAL that means that this node was inferred to have a +1 activity sign across all the 100 solutions. Similarly, for -100 that means that the node was inferred across all the 100 solutions with a -1 regulation sign. The AvgAct for each node is calculated as the weighted mean of the activity values it takes across all separate solutions: [n10+n21+n3(-1)]/(n1+n2+n3) 100, where n1 is the number of times the protein has been inferred to have an activity of 0 (or not present in that specific solution), n2 is the number of times that protein has been inferred as up-regulated/have an activity of +1, and n3 is the number of times that protein has been inferred as down-regulated/have an activity of -1, n1+n2+n3 is the total number of solutions you get.

On the other hand with T we label the target nodes or the TF's and with S the source nodes. NA's are nodes that have not been designated and which can be intermediate proteins that have been inferred or not.

Hope this helps.

Cheers, Enio

asumann commented 3 years ago

It does help a lot! Another question is that in the log file it says Solution pool: 1 solution saved. Does this contradict with 100 AvgAct(meaning +1 activity across 100 solutions)? Even if not, don't we expect to save more solutions?

log file: 483081-slurm.txt

enio23 commented 3 years ago

Hi Ansuman,

No, the AvgAct represents the weighted average activity across all the solutions you get and it was made so it will always take a value between -100 to 100 no matter how many solutions you get (see the formula I gave on the previous comment). An AvgAct=100 simply means that in all the solutions you get a protein has been inferred to have an activity of +1. Otherwise if in all solutions the inferred protein has an activity of -1, then the AvgAct will be -100. This is true no matter whether you get 1 single solution, 100 or any othe number depending on the parameters you set when you run CARNIVAL. The AvgAct could have of course also been made to take a value between -1 and 1, but I thought it is best to put the limits -100 to 100 since we usually by default like to get 100 solutions and this would make interpretation a bit easy:). You can of course as well scale AvgAct from -1 to 1 if it makes interpretation easier for you.

Cheers, Enio

asumann commented 3 years ago

Thank you Enio! Now it is much clear. I understand differences in scale, yet I suspected that having only one solution as a results wouldn't be reliable(as in the log file). But I do not think that I can get more solutions. Because I am not giving inputObj, I think this is the best I can get without feeding the algorithm with potential targets of perturbations.

enio23 commented 3 years ago

Hi Ansuman,

I think the best thing one could do is either:

a) Give it a bit more time for the analysis to run by changing the timelimit parameter, or b) Put a gap value a bit higher than the one you are getting. You can check the current minimal gap value either on the log file or in R when CARNIVAL finishes running.

Indeed one single solution might not be that much of a reliable solution :)

Cheers, Enio

asumann commented 3 years ago

Aha! That sounds like a great solution! I am closing the issue for now.

Many thanks :)