Omnipath data - Githubissues

saezlab / pypath

Python module for prior knowledge integration. Builds databases of signaling pathways, enzyme-substrate interactions, complexes, annotations and intercellular communication roles.

http://omnipathdb.org/

GNU General Public License v3.0

134 stars 47 forks source link

Omnipath data #95

Closed luli7 closed 5 years ago

luli7 commented 5 years ago

Hi,

I am getting quite confused about what/which data sets are included in Omnipath, the curated and highly trusted data sets that can be fetched by pypath or webservices.

Based on the data_formats of omnipath in pypath, it includes 'trip', 'spike', 'signalink3', 'guide2pharma', 'ca1', 'arn', 'nrf2', 'macrophage', 'death', 'pdz', 'signor', 'adhesome', 'hpmr', 'cellphonedb', 'ramilowski2015', 'psite', 'depod', 'lmpid', 'phelm', 'elm', 'domino', 'dbptm', 'hprd_p', 'biogrid', 'ccmap', 'mppi', 'dip', 'innatedb', 'matrixdb', 'intact', 'hprd'.

But based on the description from webservices, we can get omnipath by the following query: http://omnipathdb.org/interactions/?partners&fields=sources,references which includes much more data sources, however less edges (relationships).

Which way we should use to get the Omnipath? And what are the current node number and edge number included in Omnipath?

Thank you very much for your help Best regards Lu

deeenes commented 5 years ago

Hi Lu,

The programmatic interface pypath and the web service are different. Once the network is built in the web service we can not separate resources, all sources supporting the edge are shown even if some resources are not relevant for the query. E.g. MIMP contains kinase-substrate interactions without literature references, it is part of the enzyme_substrate_extra dataset. However if an interaction contained by MIMP is confirmed by literature curated resources it will be shown even if you query only the omnipath dataset, which contains only literature supported interactions.

At the same time pypath works the following way: for example at signor in data_formats you see input is signor_interactions and in the dataio.signor_interactions function you see it downloads the data from urls.urls['signor']['all_url_new'], looking at the urls module this is https://signor.uniroma2.it/download_entity.php.

Best,

Denes

luli7 commented 5 years ago

Dear Denes,

Thank you very much for your answer.

I have another question, that is why Netpath and ALZ (AlzPathway) are removed from Omnipath? They are part of the referenced pathway data_formats. However after import pathway into omnipath, you moved these two.

Another question is related to importing reaction (process diagram) databases, such as Reactome to Omnipath. I have attempted the methods on the pypath_guide, but was not successful. You did wrote a warning. I wonder what is the current state and is there a way to merge reaction based data set with Omnipath?

Thank you for your patience and help Best regards Lu

deeenes commented 5 years ago

Hi Lu,

I have another question, that is why Netpath and ALZ (AlzPathway) are removed from Omnipath? They are part of the referenced pathway data_formats. However after import pathway into omnipath, you moved these two.

AlzPathway is an old small network built by Barabasi group for one of their projects. It misses literature references that's why not in OmniPath. In addition it's small and old, not very significant. NetPath is a process description data structure (according to the SBGN standard), similar to Reactome, same points are valid for it.

Another question is related to importing reaction (process diagram) databases, such as Reactome to Omnipath. I have attempted the methods on the pypath_guide, but was not successful. You did wrote a warning. I wonder what is the current state and is there a way to merge reaction based data set with Omnipath?

About process description (aka reaction) databases: the module pypath.pyreact provides a generic, customizable BioPax parser and this has been used for processing most of these resources. As it was unmaintained in the past 2 years it's connection with the other modules are most probably broken. Also likely this can be fixed with minor intervention. Why I haven't done this yet? Because conversion from process description to activity flow data structure is not trivial and always comes with information loss; and I think the method of conversion in the pyreact module are not the optimal (I mean they are ok, but we could do still much better). From your viewpoint, I don't think it's particularly useful to merge this kind of data with your network. I don't know what you are using the networks for, but introducing more noise into the data is in general not advantageous.

Best,

Denes

luli7 commented 5 years ago

Dear Denes,

Thank you very much for your answer. I wonder whether the result obtained from webservices is a "static" /fixed version of Omnipath? or is it dynamically retrieved by PyPath on your side every time when we do the query?

Is there a versioned downloadable form of Omnipath? That is less dynamic and updated less regularly?

Is there a output function in PyPath that we can export Omnipath into BioPax, or input for Cytoscape? I am aware of the cytoscape plugin and have tried, but have obtained different numbers of nodes and edges comparing with the data set I retrieved via PyPath.

Finally, Is there a way to load data in PyPath without using cache?

Thank you for your answer Best regards Lu

npalacioescat commented 5 years ago

Hi Lu,

Dear Denes,

Thank you very much for your answer. I wonder whether the result obtained from webservices is a "static" /fixed version of Omnipath? or is it dynamically retrieved by PyPath on your side every time when we do the query?

Is there a versioned downloadable form of Omnipath? That is less dynamic and updated less regularly?

The data in OmniPath webservice is more "static". This is, it's queried by PyPath and uploaded to the server periodically, but not queried dynamically (that would take a lot of time). The "downloadable" version of OmniPath is the plain text table you can see from the browser (or retrieve programatically with Curl for instance). You can save it easily from your browser by right-clicking and selecting "save as..."

Is there a output function in PyPath that we can export Omnipath into BioPax, or input for Cytoscape? I am aware of the cytoscape plugin and have tried, but have obtained different numbers of nodes and edges comparing with the data set I retrieved via PyPath.

The Cytoscape plugin queries the OmniPath webservice directly, therefore the different number of nodes/edges can be due to recent updates in the sources of the data and/or failing to retrieve some of them with PyPath.

Finally, Is there a way to load data in PyPath without using cache?

Yes, you can use:

with pypath.curl.cache_off():
    # load your query here

Thank you for your answer Best regards Lu

Cheers,

Nico