saezlab / OmnipathR

R client for the OmniPath web service
https://r.omnipathdb.org/
Other
105 stars 20 forks source link

ligands and receptors #22

Closed slowkow closed 4 years ago

slowkow commented 4 years ago

Could I please ask if OmnipathR provides access to a list of gene pairs?

That is, I would like to get a dataframe where each row corresponds to a pair of genes (or proteins), e.g., CXCL10 and CXCR3.

So far, here's what I could figure out from the vignette:

library(OmnipathR)

intercell <- import_Omnipath_intercell()

my_genes <- c("CXCL11", "CXCL10", "CXCL1", "CXCL9", "CXCL13", "CXCR3", "CXCR5")

setdiff(my_genes, intercell$genesymbol)

intercell %>%
  dplyr::filter(genesymbol %in% my_genes)
        category                  parent         database    scope     aspect
1  transmembrane           transmembrane UniProt_location  generic locational
2  transmembrane           transmembrane UniProt_location  generic locational
3  transmembrane           transmembrane UniProt_topology  generic locational
4  transmembrane           transmembrane UniProt_topology  generic locational
5  transmembrane           transmembrane  UniProt_keyword  generic locational
6  transmembrane           transmembrane  UniProt_keyword  generic locational
7  transmembrane transmembrane_predicted          Phobius  generic locational
8  transmembrane transmembrane_predicted          Phobius  generic locational
9  transmembrane           transmembrane     GO_Intercell  generic locational
10 transmembrane           transmembrane     GO_Intercell  generic locational
11 transmembrane           transmembrane      CellPhoneDB specific locational
12 transmembrane           transmembrane      CellPhoneDB specific locational
13 transmembrane           transmembrane           LOCATE  generic locational
              source uniprot genesymbol entity_type consensus_score transmitter
1  resource_specific  P49682      CXCR3     protein               5       False
2  resource_specific  P32302      CXCR5     protein               6       False
3  resource_specific  P49682      CXCR3     protein               5       False
4  resource_specific  P32302      CXCR5     protein               6       False
5  resource_specific  P49682      CXCR3     protein               5       False
6  resource_specific  P32302      CXCR5     protein               6       False
7  resource_specific  P49682      CXCR3     protein               5       False
8  resource_specific  P32302      CXCR5     protein               6       False
9  resource_specific  P49682      CXCR3     protein               5       False
10 resource_specific  P32302      CXCR5     protein               6       False
11 resource_specific  P49682      CXCR3     protein               5       False
12 resource_specific  P32302      CXCR5     protein               6       False
13 resource_specific  P32302      CXCR5     protein               6       False
   receiver secreted plasma_membrane_transmembrane plasma_membrane_peripheral
1     False    False                          True                      False
2     False    False                          True                      False
3     False    False                          True                      False
4     False    False                          True                      False
5     False    False                          True                      False
6     False    False                          True                      False
7     False    False                          True                      False
8     False    False                          True                      False
9     False    False                          True                      False
10    False    False                          True                      False
11    False    False                          True                      False
12    False    False                          True                      False
13    False    False                          True                      False
 [ reached 'max' / getOption("max.print") -- omitted 243 rows ]

Notice that CXCR3 is listed by itself, but there is no hint that CXCL10 is a ligand for this receptor.

Is there some way to run OmniPathR to get the known pairs of proteins that interact?

I'll keep looking in the documentation, but I would greatly appreciate any hints or tips!

Thank you.

slowkow commented 4 years ago

I think I'm starting to get it... there are a lot of functions to explore.

Here's what I have now:

d <- import_LigrecExtra_Interactions(select_organism = 9606)

d[1:5,1:5]
  source target source_genesymbol target_genesymbol is_directed
1 P46531 Q9Y219            NOTCH1              JAG2           1
2 Q9Y219 P46531              JAG2            NOTCH1           1
3 O00548 P46531              DLL1            NOTCH1           1
4 P46531 O00548            NOTCH1              DLL1           1
5 P05019 P08069              IGF1             IGF1R           1

Looks great!

deeenes commented 4 years ago

Hi @slowkow,

The ligrecextra is a dataset within the interaction database of OmniPath, containing interactions from resources dedicated to ligand-receptor relationships but providing no literature references. Ligand-receptor and other cell-cell interactions might be part of other datasets so I wouldn't recommend to use only this one.

The import_omnipath_intercell function retrieves the intercellular communication role annotations (the intercell database of OmniPath).

OmnipathR has a function to combine these annotations with the interactions to build a network of intercellular communication:

icn <- import_intercell_network()

This function provides a great flexibility, I recommend to read its docs, it passes the parameters to import_omnipath_interactions and import_omnipath_intercell.

Best,

Denes

slowkow commented 4 years ago

Thanks for the tip, @deeenes

Here's what I get:

> icn <- OmnipathR::import_Omnipath_intercell()
Downloaded 267508 intercell records
> icn
        category        parent         database   scope     aspect
1  transmembrane transmembrane UniProt_location generic locational
2  transmembrane transmembrane UniProt_location generic locational
3  transmembrane transmembrane UniProt_location generic locational
4  transmembrane transmembrane UniProt_location generic locational
5  transmembrane transmembrane UniProt_location generic locational
6  transmembrane transmembrane UniProt_location generic locational
7  transmembrane transmembrane UniProt_location generic locational
8  transmembrane transmembrane UniProt_location generic locational
9  transmembrane transmembrane UniProt_location generic locational
10 transmembrane transmembrane UniProt_location generic locational
11 transmembrane transmembrane UniProt_location generic locational
12 transmembrane transmembrane UniProt_location generic locational
13 transmembrane transmembrane UniProt_location generic locational
              source uniprot genesymbol entity_type consensus_score transmitter
1  resource_specific  Q8TDQ1    CD300LF     protein               7       False
2  resource_specific  Q02223   TNFRSF17     protein               8       False
3  resource_specific  Q7Z3J2   C16orf62     protein               4       False
4  resource_specific  Q14C87   TMEM132D     protein               6       False
5  resource_specific  Q8N5U1     MS4A15     protein               5       False
6  resource_specific  Q9Y6I8      PXMP4     protein               6       False
7  resource_specific  P12821        ACE     protein               8       False
8  resource_specific  Q96RI9      TAAR9     protein               5       False
9  resource_specific  P16109       SELP     protein               8       False
10 resource_specific  Q04656      ATP7A     protein               7       False
11 resource_specific  B6A8C7      TARM1     protein               5       False
12 resource_specific  P00846    MT-ATP6     protein               5       False
13 resource_specific  O00258        WRB     protein               6       False
   receiver secreted plasma_membrane_transmembrane plasma_membrane_peripheral
1     False    False                          True                      False
2     False    False                          True                      False
3     False    False                         False                      False
4     False    False                         False                      False
5     False    False                         False                      False
6     False    False                         False                      False
7     False     True                          True                      False
8     False    False                          True                      False
9     False    False                          True                      False
10    False    False                         False                      False
11    False    False                          True                      False
12    False    False                         False                      False
13    False    False                         False                      False
 [ reached 'max' / getOption("max.print") -- omitted 267495 rows ]

This looks interesting, but I don't see how we can convert this to gene pairs. Am I missing something?

slowkow commented 4 years ago

Sorry, it seems that this table has gene pairs, but I didn't find them until I started poking around.

> icn %>% dplyr::filter(str_detect(uniprot, "_"), consensus_score > 10)
   category parent    database   scope     aspect            source
1    ligand ligand   Matrisome generic functional resource_specific
2    ligand ligand   Matrisome generic functional resource_specific
3    ligand ligand   Matrisome generic functional resource_specific
4    ligand ligand       iTALK generic functional resource_specific
5    ligand ligand       iTALK generic functional resource_specific
6    ligand ligand       iTALK generic functional resource_specific
7    ligand ligand     EMBRACE generic functional resource_specific
8    ligand ligand     EMBRACE generic functional resource_specific
9    ligand ligand     EMBRACE generic functional resource_specific
10   ligand ligand        HGNC generic functional resource_specific
11   ligand ligand        HGNC generic functional resource_specific
12   ligand ligand        HGNC generic functional resource_specific
13   ligand ligand CellPhoneDB generic functional resource_specific
                 uniprot          genesymbol entity_type consensus_score
1  COMPLEX:P20783_P23560   COMPLEX:BDNF_NTF3     complex              12
2  COMPLEX:P08476_P09529 COMPLEX:INHBA_INHBB     complex              12
3  COMPLEX:P26441_Q9UBD9  COMPLEX:CLCF1_CNTF     complex              11
4  COMPLEX:P08476_P09529 COMPLEX:INHBA_INHBB     complex              12
5  COMPLEX:P20783_P23560   COMPLEX:BDNF_NTF3     complex              12
6  COMPLEX:P26441_Q9UBD9  COMPLEX:CLCF1_CNTF     complex              11
7  COMPLEX:P20783_P23560   COMPLEX:BDNF_NTF3     complex              12
8  COMPLEX:P08476_P09529 COMPLEX:INHBA_INHBB     complex              12
9  COMPLEX:P26441_Q9UBD9  COMPLEX:CLCF1_CNTF     complex              11
10 COMPLEX:P08476_P09529 COMPLEX:INHBA_INHBB     complex              12
11 COMPLEX:P20783_P23560   COMPLEX:BDNF_NTF3     complex              12
12 COMPLEX:P26441_Q9UBD9  COMPLEX:CLCF1_CNTF     complex              11
13 COMPLEX:P08476_P09529 COMPLEX:INHBA_INHBB     complex              12
   transmitter receiver secreted plasma_membrane_transmembrane
1         True    False     True                         False
2         True    False     True                         False
3         True    False     True                         False
4         True    False     True                         False
5         True    False     True                         False
6         True    False     True                         False
7         True    False     True                         False
8         True    False     True                         False
9         True    False     True                         False
10        True    False     True                         False
11        True    False     True                         False
12        True    False     True                         False
13        True    False     True                         False
   plasma_membrane_peripheral
1                       False
2                       False
3                       False
4                       False
5                       False
6                       False
7                       False
8                       False
9                       False
10                      False
11                      False
12                      False
13                      False
 [ reached 'max' / getOption("max.print") -- omitted 625 rows ]

This is very useful! I like that each pair is annotated with some information about the database!

Could I please ask if you might comment on the consensus_score column? I can't find how this value is defined.

deeenes commented 4 years ago

Hi,

You should use the import_intercell_network() instead of import_omnipath_intercell(). The former combines 2 intercell annotation tables with one network table, while the latter only provides one intercell annotation table.

So first do like this (optionally a with custom parameters):

icn <- import_intercell_network()

This data frame has 44 columns, as it is combined from 3 data frames, maybe some of them are redundant. Some of the important ones:

The consensus_score for intercell annotations is the number of resources supporting a certain annotation; it is comparable only within category because the number of total resources is different for each category (e.g. if we have 9 resources describing ligands, and only 2 of them annotates a protein as ligand then it's a low value, however if we have 2 resources for protease inhibitors and both of them annotates a protein as such then it's a high number).

I hope this helps.

Best,

Denes

slowkow commented 4 years ago

Here are the functions provided by OmnipathR_1.2.1 in my R session:

> OmnipathR::
OmnipathR::get_annotation_databases          OmnipathR::import_Omnipath_PTMS
OmnipathR::get_complex_genes                 OmnipathR::import_Omnipath_annotations
OmnipathR::get_complexes_databases           OmnipathR::import_Omnipath_complexes
OmnipathR::get_interaction_databases         OmnipathR::import_Omnipath_intercell
OmnipathR::get_intercell_categories          OmnipathR::import_PathwayExtra_Interactions
OmnipathR::get_intercell_classes             OmnipathR::import_TFregulons_Interactions
OmnipathR::get_ptms_databases                OmnipathR::import_miRNAtarget_Interactions
OmnipathR::get_signed_ptms                   OmnipathR::interaction_graph
OmnipathR::import_AllInteractions            OmnipathR::printPath_es
OmnipathR::import_KinaseExtra_Interactions   OmnipathR::printPath_vs
OmnipathR::import_LigrecExtra_Interactions   OmnipathR::print_interactions
OmnipathR::import_Omnipath_Interactions      OmnipathR::ptms_graph

> OmnipathR::import_intercell_network
Error: 'import_intercell_network' is not an exported object from 'namespace:OmnipathR'
> OmnipathR:::import_intercell_network
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
  object 'import_intercell_network' not found

After installing the version from GitHub (OmnipathR_1.3.7), now I have the function available.

The result looks excellent! Thank you so much.

> icn %>% filter(source_genesymbol == "CXCL13") %>% head(1) %>% t
                                               [,1]                                                                                                                                             
category_intercell_source                      "ligand"                                                                                                                                         
parent_intercell_source                        "ligand"                                                                                                                                         
source                                         "O43927"                                                                                                                                         
target                                         "O00574"                                                                                                                                         
category_intercell_target                      "receptor"                                                                                                                                       
parent_intercell_target                        "receptor"                                                                                                                                       
target_genesymbol                              "CXCR6"                                                                                                                                          
source_genesymbol                              "CXCL13"                                                                                                                                         
is_directed                                    "1"                                                                                                                                              
is_stimulation                                 "1"                                                                                                                                              
is_inhibition                                  "0"                                                                                                                                              
consensus_direction                            "1"                                                                                                                                              
consensus_stimulation                          "1"                                                                                                                                              
consensus_inhibition                           "0"                                                                                                                                              
dip_url                                        ""                                                                                                                                               
sources                                        "Wang"                                                                                                                                           
references                                     ""                                                                                                                                               
curation_effort                                "0"                                                                                                                                              
n_references                                   "0"                                                                                                                                              
n_resources                                    "1"                                                                                                                                              
database_intercell_source                      "Matrisome;iTALK;HGNC;CellPhoneDB;GO_Intercell;HPMR;ICELLNET;Ramilowski2015;Kirouac2010;Guide2Pharma;LRdb;Baccin2019;OmniPath"                   
scope_intercell_source                         "generic"                                                                                                                                        
aspect_intercell_source                        "functional"                                                                                                                                     
category_source_intercell_source               "resource_specific"                                                                                                                              
genesymbol_intercell_source                    "CXCL13"                                                                                                                                         
entity_type_intercell_source                   "protein"                                                                                                                                        
consensus_score_intercell_source               "12"                                                                                                                                             
transmitter_intercell_source                   "TRUE"                                                                                                                                           
receiver_intercell_source                      "FALSE"                                                                                                                                          
secreted_intercell_source                      "TRUE"                                                                                                                                           
plasma_membrane_transmembrane_intercell_source "FALSE"                                                                                                                                          
plasma_membrane_peripheral_intercell_source    "FALSE"                                                                                                                                          
database_intercell_target                      "iTALK;Almen2009;CellCellInteractions;EMBRACE;HGNC;CellPhoneDB;GO_Intercell;HPMR;ICELLNET;Surfaceome;Ramilowski2015;Kirouac2010;Guide2Pharma;LRdb;Baccin2019;OmniPath"
scope_intercell_target                         "generic"                                                                                                                                        
aspect_intercell_target                        "functional"                                                                                                                                     
category_source_intercell_target               "resource_specific"                                                                                                                              
genesymbol_intercell_target                    "CXCR6"                                                                                                                                          
entity_type_intercell_target                   "protein"                                                                                                                                        
consensus_score_intercell_target               "15"                                                                                                                                             
transmitter_intercell_target                   "FALSE"                                                                                                                                          
receiver_intercell_target                      "TRUE"                                                                                                                                           
secreted_intercell_target                      "FALSE"                                                                                                                                          
plasma_membrane_transmembrane_intercell_target "TRUE"                                                                                                                                           
plasma_membrane_peripheral_intercell_target    "FALSE"                                                                                                                                          

Thank you for describing the consensus score!

alberto-valdeolivas commented 4 years ago

Hello,

It seems that you are using the release version from Bioconductor. That version does not yet have all the functionalities (Bioconductor will be upated soon, by the end of October). Until then, you can install the package from this Github repo (version 1.3.7)

Best, Alberto.

deeenes commented 4 years ago

Closing bc looks like the bioc vs. dev version caused the confusion. Feel free to reopen or comment if you have any further question.