Closed qijt123 closed 1 year ago
Hi @qijt123,
The intercell annotations in OmniPath intend to cover the broadest available information, that contains a number of false positives. You can indeed filter the intercell network by localizations (e.g. ligands must be secreted, receptors must be plasma membrane), and also by consensus across resources, as shown here. This function provides the greatest flexibility, though some arguments of this function also provide basic filtering.
Alternatively, you can use interactions that have been curated in a cell-cell communication context.
About the pathways see the answer here. Pathways are available in the OmniPath Annotations database. Please note that there are great differences in the concept of pathways between resources: a pathway in SignaLink has completely different meaning than a pathway in KEGG or SIGNOR. Pathways are ultimately functional annotations, i.e. they only tell that some genes or proteins have something to do with a common biological function. It means you can consider other functional annotations too, e.g. MSigDB, HGNC, UniProt. You can explore these and more resources in the OmniPath Annotations database:
library(OmnipathR)
get_annotation_resources()
[1] "Adhesome" "Almen2009" "Baccin2019" "CancerDrugsDB" "CancerGeneCensus" "CancerSEA" "CellCall" "CellCellInteractions" "CellChatDB" "CellChatDB_complex" "Cellinker" "Cellinker_complex"
[13] "CellPhoneDB" "CellPhoneDB_complex" "CellTalkDB" "CellTypist" "ComPPI" "connectomeDB2020" "CORUM_Funcat" "CORUM_GO" "CSPA" "CSPA_celltype" "CytoSig" "DGIdb"
[25] "DisGeNet" "EMBRACE" "Exocarta" "GO_Intercell" "GPCRdb" "Guide2Pharma" "HGNC" "HPA_secretome" "HPA_subcellular" "HPA_tissue" "HPMR" "HumanCellMap"
[37] "ICELLNET" "ICELLNET_complex" "Integrins" "InterPro" "IntOGen" "iTALK" "KEGG-PC" "kinase.com" "Kirouac2010" "Lambert2018" "LOCATE" "LRdb"
[49] "Matrisome" "MatrixDB" "MCAM" "Membranome" "MSigDB" "NetPath" "OPM" "PanglaoDB" "Phobius" "Phosphatome" "PROGENy" "Ramilowski_location"
[61] "Ramilowski2015" "scConnect" "scConnect_complex" "SignaLink_function" "SignaLink_pathway" "SIGNOR" "Surfaceome" "talklr" "TCDB" "TFcensus" "TopDB" "UniProt_family"
[73] "UniProt_keyword" "UniProt_location" "UniProt_tissue" "UniProt_topology" "Vesiclepedia" "Wang" "Zhong2015"
Then you can access the resources interesting for you, using wide = TRUE
results a better format:
library(OmnipathR)
kpc <- import_omnipath_annotations(resources = 'KEGG-PC', wide = TRUE)
# A tibble: 2,904 × 4
uniprot genesymbol entity_type pathway
<chr> <chr> <chr> <chr>
1 A8K7J7 A8K7J7 protein Galactose metabolism
2 A8K7J7 A8K7J7 protein Fructose and mannose metabolism
3 A8K7J7 A8K7J7 protein Starch and sucrose metabolism
4 A8K7J7 A8K7J7 protein Amino sugar and nucleotide sugar metabolism
5 A8K7J7 A8K7J7 protein Metabolic pathways
6 A8K7J7 A8K7J7 protein Butirosin and neomycin biosynthesis
7 A8K7J7 A8K7J7 protein Glycolysis / Gluconeogenesis
8 B4DDQ8 B4DDQ8 protein Glycolysis / Gluconeogenesis
9 B4DDQ8 B4DDQ8 protein Pentose phosphate pathway
10 B4DDQ8 B4DDQ8 protein Starch and sucrose metabolism
# ℹ 2,894 more rows
# ℹ Use `print(n = ...)` to see more rows
The pathway annotations can be added to the network data frame using this function. It is enough to provide the name of the annotation resource, or the annotation data frame. Maybe some interaction annotations can be useful too, you can check these out following the vignette. Another question which network datasets to use: see here the description of the datasets. I would recommend to use omnipath
, ligrecextra
, and if you need even more interactions, maybe also pathwayextra
. The optimal size of the network depends on your downstream methods.
The ligand/receptor annotations are also available at a finer granularity, specific subclasses from specific resources:
library(OmnipathR)
library(dplyr)
ic_spec <- import_omnipath_intercell(
aspect = 'functional',
scope = 'specific',
source = 'resource_specific'
)
ic_spec %>% filter(database == 'HGNC')
# A tibble: 3,609 × 15
category parent database scope aspect source uniprot genesymbol entity_type consensus_score transmitter receiver secreted plasma_membrane_transmembrane plasma_membrane_peripheral
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 angiopoietin ligand HGNC specific functional resource_specific Q9UKU9 ANGPTL2 protein 0 TRUE FALSE TRUE FALSE FALSE
2 angiopoietin ligand HGNC specific functional resource_specific COMPLEX:Q9Y5C1 COMPLEX:ANGPTL3 complex 0 TRUE FALSE TRUE FALSE FALSE
3 angiopoietin ligand HGNC specific functional resource_specific Q86XS5 ANGPTL5 protein 0 TRUE FALSE TRUE FALSE FALSE
4 angiopoietin ligand HGNC specific functional resource_specific Q6UXH0 ANGPTL8 protein 0 TRUE FALSE TRUE FALSE FALSE
5 angiopoietin ligand HGNC specific functional resource_specific COMPLEX:Q9UKU9 COMPLEX:ANGPTL2 complex 0 TRUE FALSE TRUE FALSE FALSE
6 angiopoietin ligand HGNC specific functional resource_specific Q8NI99 ANGPTL6 protein 0 TRUE FALSE TRUE FALSE FALSE
7 angiopoietin ligand HGNC specific functional resource_specific Q9BY76 ANGPTL4 protein 0 TRUE FALSE TRUE FALSE FALSE
8 angiopoietin ligand HGNC specific functional resource_specific Q9Y5C1 ANGPTL3 protein 0 TRUE FALSE TRUE FALSE FALSE
9 angiopoietin ligand HGNC specific functional resource_specific O43827 ANGPTL7 protein 0 TRUE FALSE TRUE FALSE FALSE
10 angiopoietin ligand HGNC specific functional resource_specific O95841 ANGPTL1 protein 0 TRUE FALSE TRUE FALSE FALSE
# ℹ 3,599 more rows
# ℹ Use `print(n = ...)` to see more rows
For example, HGNC contains many specific subclasses, e.g. above "angiopoietin" is a class of ligands. These are not exactly pathways, but families sharing a common structure, origin and function.
I hope this help, please let me know if you have further questions. I see you opened the same issue at the Python client, I'm answering it here and closing the other one.
Best,
Denes
Hi, @deeenes
Thank you very much for your timely reply. Your answer has solved most of my questions, but I still have some basic questions to know.
I wonder what is the meaning of 'category' and 'parent'? What's the difference between these two. I also want to know what is the difference between 'n_references' and 'n_resources'?
Best
qijt
Specific categories have generic categories as parents, while each generic category is the parent of itself. All these categories are defined here. The definitions of the terminology are in the EV10 table of our latest paper. The arguments of this function correspond to the attributes included in the table above. As an example, ligand
is a generic category (its scope
is generic
, its aspect
is functional
because acting as a ligand is a molecular or biological function). Its source
can be resource_specific
, for example "all ligands from UniProt", or composite
, if it's the combination of ligands from multiple resources. Categories with specific
scope
might have ligand
as their parent
, these are specific subclasses of ligands, e.g. interleukin
; these specific categories are almost always resource_specific
regarding their source
. See also the Intercellular signaling roles section under the Methods. The same is true for all other categories, such as receptors, transporters, etc.
n_references
and n_resources
are simply the count of unique literature references and resources for each interaction record. These might be indicators of the likelihood that the interaction is correct (but not the actual strength of the interaction). If your methods require small network, setting a threshold on these variables might be a way to create a higher confidence but smaller network. These fields are created automatically in OmnipathR
after downloading the data, by simply counting the unique values in the sources and references columns.
Hi, @deeenes ,
Thank you very much for your patient answer. Your answer has solved my problem. Thank you very much.
Best
qjt
Hi,
Thank you very much for your Omnipath database.I want to see the information of intercellular communication, so I want to get the data of the interaction of all receptors and ligands in the Omnipath database. How should I do that?
In addition, I also want to know the pathway information of receptor ligand pairs, can this be obtained from the Omnipath?
Thanks.