Closed nijibabulu closed 6 months ago
Hello
We are not using all DE genes as "measurements" in CARNIVAL. CARNIVAL models how sinaling leads to the regulators of differential gene expression, ie. to transcription factors. This is also why our prior knowledge network is a protein-protein interaction network and not a gene regulatory network.
If you would work with this in a non-academic setting, I would suggest to try to eliminate nodes and edges from the prior knowledge network. We used a couple of strategies in cosmos (https://github.com/saezlab/cosmosR), e.g. (1) remove nodes from the network that are not expressed in any condition based on the RNAseq data (2) remove nodes that are not connected with a directed path to the measurement (they are not observable anyways) (3) remove nodes that are further than N number of steps from the inputs/measurements. E.g. N = 8. The rationale here is that the more steps we do on the network the less likely that our prediction makes any sense.
From these, mostly (3) has the strongest impact on the computational time. Maybe you need to go down to N=5 for CBC.
Thank you for this very clear answer! This is all very helpful.
We are generally interested in detecting perturbed regulator proteins and their networks using causal networks in a de-novo fashion, which would in theory make sense to apply inverse CARNIVAL to solve. However, I am a unclear on how to preprocess the input genes and network to get a reliably solvable problem. My questions are generally:
Below I discuss what I have tried so far.
I noticed that in the available examples, the PKNs and measurement inputs are filtered by some subset of genes, usually TFs and pathway genes. (Below)
https://github.com/saezlab/transcriptutorial https://github.com/saezlab/CausalToxNet https://github.com/harrytr/p53
These often have pre-loaded networks and TFs, but I am trying to develop a pipeline. I tried two different methods based on those and got very different outcomes in terms of converging on a solution. I am starting with 17.5k gene statistics (based on running limma) as measurements. I then used the following:
ULM/collectri
VIPER/Omnipath
This is adapted more or less from the p53 repository
The VIPER/Omnipath method is solvable quickly with CBC, but as far as I am aware from what I am reading, is somewhat out of date in comparison to the previous method.
Thank you for your help!