Closed laurie-tonon closed 1 year ago
Hi,
PROGENy and other annotation resources are not yet available for organisms other than human. However, you can easily translate them by orthology. Running these for the first time might take long as it requires many downloads from Ensembl, HomoloGene and UniProt. Subsequent runs work from cache and take only a few seconds. Here we use the modules pypath
and omnipath
, which are available by pip
:
pip install https://github.com/saezlab/omnipath
pip install https://github.com/saezlab/pypath
import omnipath
from pypath.utils import homology, mapping
progeny = omnipath.requests.Annotations.get(resources = 'PROGENy', wide = True)
progeny['mouse_uniprot'] = [homology.translate(u, 10090) for u in progeny.uniprot]
progeny = progeny.explode('mouse_uniprot')
progeny['mouse_genesymbol'] = [mapping.label(u, ncbi_tax_id = 10090) for u in progeny.mouse_uniprot]
progeny
# uniprot genesymbol entity_type p_value pathway weight mouse_uniprot mouse_genesymbol
# 0 P35250 RFC2 protein 0.624086 Trail -0.800677 Q9WUK4 Rfc2
# 1 P35250 RFC2 protein 0.000704 Hypoxia -2.049501 Q9WUK4 Rfc2
# 2 P35250 RFC2 protein 0.001655 EGFR 1.470647 Q9WUK4 Rfc2
# 3 P35250 RFC2 protein 0.833456 TNFa -0.124993 Q9WUK4 Rfc2
# 4 P35250 RFC2 protein 0.630460 TGFb -0.430508 Q9WUK4 Rfc2
# ... ... ... ... ... ... ... ... ...
# 233402 Q96A11 GAL3ST3 protein 0.236295 PI3K -0.228038 P61315 Gal3st3
# 233403 Q96A11 GAL3ST3 protein 0.705764 JAK-STAT 0.052601 P61315 Gal3st3
# 233404 Q96A11 GAL3ST3 protein 0.575544 EGFR 0.070407 P61315 Gal3st3
# 233405 Q96A11 GAL3ST3 protein 0.988972 Trail -0.005215 P61315 Gal3st3
# 233406 Q96A11 GAL3ST3 protein 0.607089 Hypoxia 0.063501 P61315 Gal3st3
#
# [237671 rows x 8 columns]
I hope this helps.
Best,
Denes
Thanks a lot, that should indeed help me. I tried to run your example but it throws an error. I installed pypath-omnipath via pip and everything is fine. But when I want to import via:
from pypath.utils import homology,mapping
I have an error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [2], in <module>
1 import omnipath
----> 2 from pypath.utils import homology,mapping
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/utils/homology.py:41, in <module>
37 import pickle
39 import timeloop
---> 41 import pypath.utils.mapping as mapping
42 import pypath.share.common as common
43 import pypath.internals.intera as intera
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/utils/mapping.py:73, in <module>
71 import pypath.inputs.uniprot as uniprot_input
72 import pypath.inputs.pro as pro_input
---> 73 import pypath.inputs.biomart as biomart_input
74 import pypath.inputs.unichem as unichem_input
75 import pypath.internals.input_formats as input_formats
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/inputs/biomart.py:36, in <module>
34 import pypath.share.curl as curl
35 import pypath.resources.urls as urls
---> 36 import pypath.utils.taxonomy as taxonomy
38 _logger = session_mod.Logger(name = 'biomart_input')
41 # for mouse homologues: Filter name = "with_mmusculus_homolog"
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/utils/taxonomy.py:88, in <module>
49 # XXX: Shouldn't we keep all functions and variables separated
50 # (together among them)?
51 taxids = {
52 9606: 'human',
53 10090: 'mouse',
(...)
80 9544: 'rhesus macaque',
81 }
83 taxids2 = dict(
84 (
85 t.taxon_id,
86 t.common_name.lower()
87 )
---> 88 for t in ensembl_input.ensembl_organisms()
89 )
91 taxa = common.swap_dict_simple(taxids)
92 taxa2 = common.swap_dict_simple(taxids2)
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/inputs/ensembl.py:52, in ensembl_organisms()
49 c = curl.Curl(url)
50 soup = bs4.BeautifulSoup(c.result, 'html.parser')
---> 52 for r in soup.find('table').find_all('tr'):
54 if not record:
56 record = collections.namedtuple(
57 'EnsemblOrganism',
58 [c.text.lower().replace(' ', '_') for c in r] +
59 ['ensembl_name']
60 )
AttributeError: 'NoneType' object has no attribute 'find_all'
I tried to clear the cache in ~/.pypath and rerun but no effect.
Did you already see this error?
Thanks
It seems the problem is from the Ensembl website. The server at https://www.ensembl.org/info/about/species.html is down and so the package cannot be loaded.
Yes, the Ensembl server is having issues today, it's up again now but still slow. Unfortunately, without Ensembl the homology translation won't work, but this kind of error doesn't happen often. Ensembl has 4 mirrors, maybe I will later add an option for choosing mirror.
Hi, is there also an R-only solution to create all necessary databases and networks for the rat genome? pypath has an issue it seems, and cannot be installed via python, and I generally like to stay within R ;). Thanks!
Hi @chrarnold ,
In R it could look like this:
library(OmnipathR)
library(dplyr)
progeny <- import_omnipath_annotations(resources = 'PROGENy', wide = TRUE)
human_rat <- homologene_uniprot_orthology(target = 10116L, by = genesymbol)
progeny_rat <-
progeny %>%
inner_join(human_rat, by = c('uniprot' = 'source')) %>%
mutate(uniprot = target) %>%
select(-target, -genesymbol) %>%
translate_ids(uniprot, genesymbol, organism = 10116L) %>%
relocate(genesymbol, .after = uniprot)
progeny_rat
# A tibble: 86,038 × 6
uniprot genesymbol entity_type pathway weight p_value
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Q641W4 Rfc2 protein Hypoxia -2.05 7.04e- 4
2 Q641W4 Rfc2 protein TGFb -0.431 6.30e- 1
3 Q641W4 Rfc2 protein NFkB -0.410 3.72e- 1
4 Q641W4 Rfc2 protein p53 -3.35 9.86e- 4
5 Q641W4 Rfc2 protein TNFa -0.125 8.33e- 1
6 Q641W4 Rfc2 protein EGFR 1.47 1.66e- 3
7 Q641W4 Rfc2 protein Trail -0.801 6.24e- 1
8 Q641W4 Rfc2 protein JAK-STAT 0.00122 9.98e- 1
9 Q641W4 Rfc2 protein MAPK 2.28 3.32e-11
10 Q641W4 Rfc2 protein VEGF -0.157 8.48e- 1
# … with 86,028 more rows
If you have already OmnipathR installed, please update it to the most recent version (3.4.3 or 3.5.6): due to the recent UniProt URL and API update the above example won't work with earlier versions.
Best,
Denes
Thanks a lot Denes for his, this is helpful for the whole community I think! For the Bio release version, the newest version is only 3.4.0 currently, you mentioned 3.4.3, so installation from Github via devtools::install_github('saezlab/OmnipathR')
is necessary I think and it worked like a charm.
Is there a way to add the mouse msigdb database as a resource to pull from the dc.get_resource funtion? It exists on the GSEA site but is not built into the wrapper. If this is not possible, how can I load it in myself? I am trying to run an analysis on the functional enrichment of biological terms. Thanks!
Most of the mouse database knowledge is orthology translated from human, I believe MSigDB is no different. They write here:
an orthology converted version of these sets is being provided here to allow analysis in the mouse gene-space alongside other, mouse-native, sets
However, they don't tell which ones are the mouse-native sets. I think M1 and M8 are definitely, but the rest are more likely to be orthology translated, either by MSigDB or its primary resources.
The two options here:
1) Load the human MSigDB and translate to mouse by orthology as shown in my first comment.
2) Using our database builder module pypath
, process the MSigDB mouse data and write a little custom code to extract the desired data frame from the dictionaries provided by pypath
. Something like this:
from pypath.inputs import msigdb
import pandas as pd
msigdb_mouse = msigdb.msigdb_annotations(organism = 'mouse')
msigdb_mouse_df = pd.DataFrame(
[(k,) + a for k, v in msigdb_mouse.items() for a in v],
columns = ['uniprot', 'collection', 'geneset']
)
msigdb_mouse_df
uniprot collection geneset
0 Q9WVC6 mirna_targets_mirdb MIR_322_5P
1 Q9WVC6 mirna_targets_mirdb MIR_497A_5P
2 Q9WVC6 chemical_and_genetic_perturbations CADWELL_ATG16L1_TARGETS_DN
3 Q9WVC6 chemical_and_genetic_perturbations LEIN_OLIGODENDROCYTE_MARKERS
4 Q9WVC6 chemical_and_genetic_perturbations GRAESSMANN_APOPTOSIS_BY_SERUM_DEPRIVATION_UP
... ... ... ...
568859 Q80T03 reactome_pathways REACTOME_O_LINKED_GLYCOSYLATION
568860 Q80T03 reactome_pathways REACTOME_TERMINATION_OF_O_GLYCAN_BIOSYNTHESIS
568861 Q80T03 reactome_pathways REACTOME_O_LINKED_GLYCOSYLATION_OF_MUCINS
568862 Q80T03 reactome_pathways REACTOME_POST_TRANSLATIONAL_PROTEIN_MODIFICATION
568863 Q80T03 reactome_pathways REACTOME_METABOLISM_OF_PROTEINS
[568864 rows x 3 columns]
If you prefer gene symbols instead of UniProts, use the msigdb_download_collections
function:
from pypath.inputs import msigdb
import pandas as pd
msigdb_mouse_raw = msigdb.msigdb_download_collections(organism = 'mouse')
msigdb_mouse_raw_df = pd.DataFrame(
[
(collname, collcode, gset, gene)
for (collname, collcode), coll in msigdb_mouse_raw.items()
for gset, genes in coll.items()
for gene in genes
],
columns = ['collection', 'code', 'geneset', 'genesymbol']
)
msigdb_mouse_raw_df
collection code geneset genesymbol
0 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Dusp1
1 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Tnfaip3
2 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Sqstm1
3 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Rcan1
4 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Egr2
... ... ... ... ...
667940 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... S100a1
667941 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Jund
667942 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Msn
667943 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Tle5
667944 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Dcxr
[667945 rows x 4 columns]
Note: by default the c5 or m5 geneset collections (mostly gene ontology) are disabled, see the exclude
argument. MSigDB recently changed a few things on their web page, and until now the pypath.inputs.msigdb
module didn't explicitly support mouse. Hence I had to update the code in pypath. For this reason, the above example above works only with the current head of master branch (v0.14.17):
pip3 install 'git+https://github.com/saezlab/pypath.git'
downloading pypath with the code above, I received an error:
ERROR: Package 'pypath-omnipath' requires a different Python: 3.8.13 not in '<4.0,>=3.9' Note: you may need to restart the kernel to use updated packages.
Is there a way to use this package without having to downdate my python?
What's your Python version? Not a downgrade, but an upgrade should be necessary. Currently 3.9 is the minimum required version for pypath.
Closing this issue since now it is implemented as a function translate_net
in decoupler-1.3.0
.
Here is a vignette showcasing how to do it: https://decoupler-py.readthedocs.io/en/latest/notebooks/translate.html
Hello,
I am trying to analyse a dataset of mouse cells and would like to perform over-enrichment analyses and trajectory inferences with decoupler. However I can't download ressources for mouse (only Progeny), and the functions complains that the genes identifiers are not the same. Is there a way to use these functions with mouse data?
Thanks a lot