saezlab / CARNIVAL

CAusal Reasoning for Network Identification with integer VALue programming in R
https://saezlab.github.io/CARNIVAL/
57 stars 29 forks source link

Using custom network for transcription factor input #98

Closed Sharm8 closed 1 year ago

Sharm8 commented 1 year ago

Hello!

I'm interested in using carnival for some time series RNAseq and ATACseq data. I have differentially expressed genes from RNAseq as well as transcription factors I've obtained from motif analysis with the ATACseq data. I was wondering if it would be possible to generate my own TF-target input for carnival, instead of using Dorothea or Collectri?

For example, if I created my own TF data input with: my source TFs, targets obtained using maybe remap_dorothea_download matched to my RNAseq DEGs and finally a pearson correlation value between the data types (instead of the mode of regulation value from collectri).

Would this be possible?

Thank you in advance

gabora commented 1 year ago

Hi, Sorry for the late response, didn't notice this issue.

You can definitely use your custom generated inputs for carnival. As long as the measurement vector is a named vector that maps to the prior knowledge network, it will work. Best

Sharm8 commented 1 year ago

Thank you for your reply! If I were to do that, would I still have to use run_wmean with the new "net" before using it as an input? and would it be the same for cosmos?

Thank you for your help

gabora commented 1 year ago

Hi, use run_wmean if you need an enrichment analysis for your data / network.

In our case, we use run_wmean to determine TF activity from DGE and collecTRI (it calculates an enrichment score for the TF based on the differential expression of its target genes).

If I understood well, you use ATACseq to come up with a regulatory network between TFs and target genes. I think then you should use run_wmean or some other enrichment method to get the activated TFs.

Finally, you will run with CARNIVAL the list of activated TFs with some network (protein-protein interaction network, maybe from Omnipath) to infer the signaling interactions upstream from the differentially regulated TFs.

this applies to cosmos too. Cosmos is just a wrapper around carnival. best, Attila

Sharm8 commented 1 year ago

Thank you Attila, that makes sense. I just wanted to make sure since the cosmos manual (but not carnival) says about signalling input to preprocess_COSMOS_metabolism_to_signaling : "signaling_data numerical vector, where names are signaling nodes in the PKN and values are from {1, 0, -1}. Continuous data will be discretized using the sign function". And the values from run_wmean are not within that range. It works fine but I just wanted to clarify that I was not missing something.

gabora commented 1 year ago

COSMOS and CARNIVAL connects an input layer to an output layer on the PKN.

The input layer must be discretized: the user have to decide if they are up or down. If you use continuous values, then it is discretized in a preprocessing step. The ouput layer is vector of continuous values. If a number has a larger absolute value, it is prioratized in fitting. For example, it could happen that only one of two TFs could be fitted with consistent sign. The optimization will favour to fit the TF with a higher (in absolute values) value.

In any case it makes sense to keep only significant inputs. like here: https://github.com/saezlab/NCI60_cosmos/blob/main/scripts/run_cosmos.R

i hope this clarifies a bit best, Attila

Sharm8 commented 1 year ago

Yes it does, thank you!

Sharm8 commented 1 year ago

Hi again!

I have a different question but thought I would just ask here. I have some proteins that are either upregulated or downregulated in my data but the network from carnival shows the opposite activity from the data. I thought that the node activity represents upregulation or downregulation as seen in the data. I'm not sure if I have gotten that wrong. Would really appreciate if you could clarify this!

I hope this makes sense.

Thanks

enio23 commented 1 year ago

Hi,

If you would like to validate the activities from CARNIVAL networks, I would suggest using phosphoproteomics data rather than protein abundances. A similar benchmarking as what you suggest has already been done on CARNIVAL for the CCLE application where we found a good matching between inferred protein activities from CARNIVAL and previously measured phospho protein levels: phosphoprotein levels: https://static-content.springer.com/esm/art%3A10.1038%2Fs41540-019-0118-z/MediaObjects/41540_2019_118_MOESM1_ESM.pdf.

Hope this helps.

Cheers, Enio

Sharm8 commented 1 year ago

Hi, thank you for your reply!

I haven't used proteomic data. Just RNAseq and metabolomic data (along with transcription factor activity from ATACseq). I don't want to validate my findings at this point. I just want to understand what up and down activity inferred by CARNIVAL/COSMOS means. Just because all this while I thought that it represents what the input data shows. But I have a gene that is downregulated in my RNAseq data that appears to be "upregulated"/activated in my CARNIVAL network. So I think I might have misunderstood something.

Thank you, Sharmilla

enio23 commented 1 year ago

Hi again,

The relation between gene expression and protein activity is not really well characterized. While there might be an increase in the regulation of a gene, that does not necessarily mean that there is an increase in the activity of its corresponding protein. As such it is of course very possible to find discrepancies between protein activities inferred from CARNIVAL and changes in expression levels of its corresponding gene. Through CARNIVAL we aim to identify those proteins (and their direction of regulation) which appear to be involved in the regulation of those protein interaction networks leading to the regulation of downstream TF's.

Cheers, Enio

Sharm8 commented 1 year ago

Hi, I understand better now what the nodes actually represent. Thank you for your explanation!