Getting more information on how to use COSMOS

saezlab / cosmosR

COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.

https://saezlab.github.io/cosmosR/

GNU General Public License v3.0

58 stars 15 forks source link

Getting more information on how to use COSMOS #8

Closed LonnekeNouwen closed 3 years ago

LonnekeNouwen commented 3 years ago

Dear sir/madame

Recently, I came across your paper describing the COSMOS method and I would like to try to use this method for my own data. Being trained as a biomedical scientist and not a bioinformatician, there were some questions that I could not answer myself, hence this post. My first question is whether it is nescessary to filter the datasets for, for instance, significance or another threshold or whether it is also possible to use the complete datasets (metabolomics data is sometimes more analysed based on trends than on significance)? I also wondered whether it is possible to use this method with different timepoints/groups (since we have different groups in our datasets)? And lastly, I could not figure out how to use this method with only two datasets. I tried to remove one dataset from the short COSMOS tutorial on github as a test, but the code did not work anymore. Therefore, my last question is how to use COSMOS with only a phosphoproteomics and metabolomics dataset. I would like to thank you in advance for your time and I hope to hear from you soon.

Kind regards, Lonneke Nouwen

gabora commented 3 years ago

Dear Lonneke, sorry for the late reply. We recently improved the functions documentation and the vignette/tutorial is also updated. (preparing a Bioconductor submission)

please note that the previous tutorial version was using CPLEX optimization. It has to be downloaded from IBM (https://www.ibm.com/academic/technology/data-science - free for academics ). The current tutorial runs with 'lpsolve' , but it's capacity to solve this type of optimization problems is limited. Also we found a bug in a dependency that further limits lpsolve, which will be fixed next week.

regarding your questions:

you can decide which nodes you filter in your omics data. We thought that the differentially abundant/expressed would make sense, but if you would use all the data it is also fine. I would keep the inputs as small as possible otherwise the results become hard to interpret (too many nodes interconnected).
We analyzed time-course data using cosmos, but in that case I ran COSMOS for each time point / group. Then I compared the groups/time points by the networks (e.g. which edges change across groups).
COSMOS works with any 2 types of data. I think your issue there was the missing cplex optimizer.

best, Attila

LonnekeNouwen commented 3 years ago

Hi Attila, Thanks for your response! As said before, I would like to use a metabolomics and a phosphoproteomics dataset with this method. For the metabolomics datasets I understand from all the information that I need to make a pubchemnamed vector out of this metabolomics data. I dont understand, however, how I can use the phosphoproteomics data (because I cannot just use the names of the proteins, I also need to include the site of phosphorylation somehow. I read somewhere that I need to use the phosphoproteomics data to get signaling information. Is this correct and if so, how do I do this? I was also wondering how to include fluxomics data. The metabolomics dataset that I want to use is actually also a fluxomics dataset. But similar to the phosphoproteomics dataset, I dont know what format this data needs to be in and how to get there. Thanks for your help!

Best, Lonneke

gabora commented 3 years ago

Hi Lonneke,

@adugourd will jump in to answer this :)

adugourd commented 3 years ago

I will come back to you shortly, preparing a short tutorial

adugourd commented 3 years ago

Hey Lonneke,

sorry for the delayed answer. This seems to be a common point that people struggle with. Thus I made a mini script to show in parallele how to estimate TF activity from transcriptomic (which I think you know how to do now) and kinase activity from transcriptomic. you can find all of this here: https://github.com/saezlab/kinase_tf_mini_tuto

Please let me know if that already helps you with your problem.

Cheers,

Aurelien

adugourd commented 3 years ago

For the fluxomic, you can actually use it without much trouble instead of metabolomic. The only thing that may be a bit complicated is that you will have to map your fluxes to their corresponding metaoblic enzymes, and then map those enzyem to their corresponding identifiers in the prior knowledge network.

marefei commented 3 years ago

Hi all,

Thanks for developing this package! I'm a Molecular Biologist with basic skills in BioInfo and R and as Lonneke I still don't totally get how to proper use your package. In my case I'm most interested in which datatype you exactly mean with "cellular/genetic perturbations"? Do you integrate e.g. drug treatment data ? How would it be enough to run these together with transcriptomic data (as you say two out of these 5) to get your causal networks?

Greets, marefei

adugourd commented 3 years ago

Hi,

thanks for your interest!

Basically in the preprocess_COSMOS_signaling_to_metabolism function, you can give known perturbations as input to the signaling_data paramater.

If you check the tutorial, you can see that the argument signalling of the function is used with a names vector of TF and kinases with there corresponding activities. Instead, you can pass a named vector with the names of the pertubed nodes and 1 or -1 if they are up or down-regulated. For example if you have data with MAPK1 KO then you pass it a named vector with MAPK1 as name and -1 as value.

then you can pass TF activities estimated from transcriptomic instead of the "metabolic_data" parameter. The anme doesn't fit obvisouly but that's just because it was used with metabolomic data originally. We will change that in futur version.

I would also recommend to check the objects that are used in the tutorial, it might also help understading better how to pass the relevant data as inputs to the functions.

Hope that helps !

Cheers,

Aurelien