|Demo|
Are you interested in OmniPath data? Check out our R package OmnipathR,
the most popular and most versatile access point to OmniPath, a database
built from more than 150 original resources. If you use Python and don't
need to build the database yourself, try our Python client
. Read more
about the web service here
_.
.. OmnipathR: https://r.omnipathdb.org
.. Python client
: https://github.com/saezlab/omnipath
.. _web service here
: https://pypath.omnipathdb.org/webservice.html
Pypath is the database builder of OmniPath. For most people the data distributed in OmniPath is satisfying (see above), they don't really need pypath. Typically you need pypath to:
utils module
_)inputs module
_).. utils module
: https://github.com/saezlab/pypath/tree/master/pypath/utils
.. inputs module
: https://github.com/saezlab/pypath/tree/master/pypath/inputs
From PyPI:
.. code:: bash
pip install pypath-omnipath
From Git:
.. code:: bash
pip install git+https://github.com/saezlab/pypath.git
Read the reference documentation
or check out the tutorials. The most
comprehensive guide to pypath is The Pypath Book
_.
.. _reference documentation
: https://pypath.omnipathdb.org/
.. tutorials: https://workflows.omnipathdb.org/
.. The Pypath Book
: https://pypath.omnipathdb.org/notebooks/manual.html
Should you have a question or experiencing an issue, please write us by
the Github issues
_ page.
pypath is a Python module for processing molecular biology data resources,
combining them into databases and providing a versatile interface in Python
as well as exporting the data for access through other platforms such as
R, web service
, Cytoscape_ and BEL (Biological Expression Language).
.. R: https://r.omnipathdb.org/
.. web service
: https://omnipathdb.org/
.. _Cytoscape: https://apps.cytoscape.org/apps/omnipath
pypath provides access to more than 100 resources! It builds 5 major combined databases and within these we can distinguish different datasets. The 5 major databases are interactions (molecular interaction network or pathways), enzyme-substrate relationships, protein complexes, molecular annotations (functional roles, localizations, and more) and inter-cellular communication roles.
pypath consists of a number of submodules and each of them again contains a number of submodules. Overall pypath consists of around 100 modules. The most important higher level submodules:
In the beginning the primary aim of pypath
was to build networks from
multiple sources using an igraph object as the fundament of the integrated
data structure. From version 0.7 and 0.8 this design principle started to
change. Today pypath
builds a number of different databases, exposes them
by a rich API and each of them can be converted to pandas.DataFrame
.
The modules and classes responsible for the integrated databases are located
in pypath.core
. The five main databases are the followings:
core.network
core.enz_sub
core.complex
core.annot
core.intercell
Some of the databases have different variants (e.g. PPI and transcriptional network) and all can be customized by many parameters.
The databases above can be loaded by calling the appropriate classes.
However building the databases require time and memory so we want to avoid
building them more often than necessary or keeping more than one copies
in the memory. Some of the modules listed above have a method get_db
which ensures only one instance of the database is loaded. But there is a
more full featured database management system available in pypath,
this is the pypath.omnipath module. This module is able to build the
databases, automatically saves them to pickle
files and loads them from
there in subsequent sessions. pypath comes with a number of database
definitions and users can add more. The pickle
files are located by
default in the ~/.pypath/pickles/
directory. With the omnipath
module it's easy to get an instance of a database. For example to get the
omnipath
PPI network dataset:
.. code:: python
from pypath import omnipath
op = omnipath.db.get_db('omnipath')
Important: Building the databases for the first time requires the
download of several MB or GB of data from the original resources. This
normally takes long time and is prone of errors (e.g. truncated or empty
downloads due to interrupted HTTP connection). In this case you should check
the log to find the path of the problematic cache file, check the contents
of this file to find out the reason and possibly delete the file to ensure
another download attempt when you call the database build again. Sometimes
the original resources change their content or go offline. If you encounter
such case please open an issue at https://github.com/saezlab/pypath/issues
so we can fix it in pypath
. Once all the necessary contents are
downloaded and stored in the cache, the database builds are much faster,
but still can take minutes.
Apart from the databases, pypath has many submodules with standalone functionality which can be used in other modules and scripts. Below we present a few of these.
The ID conversion module utils.mapping
translates between a large variety
of gene, protein, miRNA and small molecule ID types. It has the feature to
translate secondary UniProt ACs to primaries, and Trembl ACs to SwissProt,
using primary Gene Symbols to find the connections. This module automatically
loads and stores the necessary conversion tables. Many tables
are predefined, such as all the IDs in UniProt mapping service, while
users are able to load any table from file using the classes provided
in the module input_formats
. An example how to translate identifiers:
.. code:: python
from pypath.utils import mapping
mapping.map_name('P00533', 'uniprot', 'genesymbol')
# {'EGFR'}
The pypath.utils.homology
module is able to find the orthologs of genes
between two organisms. It uses data both from NCBI HomoloGene, Ensembl and
UniProt. This module is really simple to use:
.. code:: python
from pypath.utils import homology
homology.translate('P00533', 10090) # translating the human EGFR to mouse
# ['Q01279'] # it returns the mouse Egfr UniProt AC
It is able to handle any ID type supported by pypath.utils.mapping
.
Alternatively, you can access a complete dictionary of orthologous genes,
or translate columns in a pandas data frame.
Does it run on my old Python?
Most likely it doesn't. The oldest supported version, currently 3.9, is
defined in our pyproject.toml
_.
.. _pyproject.toml
: https://github.com/saezlab/pypath/blob/master/pyproject.toml
Is there something similar in R?
OmniPath's R client
, besides accessing data from OmniPath, provides many
similar services as pypath: ID translation
, homology translation
,
taxonomy support
, GO support
_, and many more.
.. OmniPath's R client
: https://r.omnipathdb.org
.. ID translation
: https://r.omnipathdb.org/reference/translate_ids.html
.. _homology translation
: https://r.omnipathdb.org/reference/homologene_uniprot_orthology.html
.. _taxonomy support
: https://r.omnipathdb.org/reference/ncbi_taxid.html
.. _GO support
: https://r.omnipathdb.org/reference/go_annot_download.html
Questions about OmniPath
_
.. _Questions about OmniPath
: https://omnipathdb.org/#faq
We prefer to keep all communication within the Github issues
_. About private
or sensitive matters feel free to contact us by omnipathdb@gmail.com.
.. _Github issues
: https://github.com/saezlab/pypath/issues
The development of pypath
is coordinated by Dénes Türei
in the
Saez Lab
, with the contribution of developers and scientists from
other groups:
HU Biological Data Science Lab (PI: Tunca Doğan)
_ created many new input
modules in pypath
;Korcsmaros Lab
_ contributed to the overall design of OmniPath, the
design and implementation of the intercellular communication database,
and with various case studies and tutorials;Fabian Theis
developed the
Python client
for the OmniPath web service;Saez Lab
_, Olga Ivanova introduced the resource manager in
pypath
, Sophia Müller-Dott added the CollecTRI gene regulatory network,
while Nicolàs Palacio, Sebastian Lobentanzer and Ahmet Rifaioglu
have done various maintenance and refactoring works. Aurelien Dugourd and
Christina Schmidt helped with the design of the metabolomics related
datasets and services.R package
and the Cytoscape app
are developed and maintained by
Francesco Ceccarelli, Attila Gábor, Alberto Valdeolivas, Dénes Türei and
Nicolàs Palacio;.. Saez Lab
: https://saezlab.org/
.. HU Biological Data Science Lab (PI: Tunca Doğan)
: https://yunus.hacettepe.edu.tr/~tuncadogan/
.. Dénes Türei
: https://denes.omnipathdb.org/
.. R package
: https://r.omnipathdb.org
.. Cytoscape app
: https://apps.cytoscape.org/apps/omnipath
.. Fabian Theis
: https://www.helmholtz-munich.de/en/icb/research-groups/theis-lab/
.. _Korcsmaros Lab
: https://korcsmaroslab.org/
See here a bird eye view of pypath's development history. For more details
about recent developments see the Github releases
.
.. here: https://pypath.omnipathdb.org/releasehistory.html
.. Github releases
: https://github.com/saezlab/pypath/releases
.. |Demo| image:: https://raw.githubusercontent.com/saezlab/pypath/master/docs/source/_static/img/pypath-demo.webp