saeyslab / nichenetr

NicheNet: predict active ligand-target links between interacting cells
452 stars 113 forks source link

NicheNet-v2 data source collection and processing (OmniPath)? #237

Closed beigelk closed 8 months ago

beigelk commented 8 months ago

Hello! I have a question about the model in NicheNet-v2, specifically how the databases from OmniPath were processed/filtered when getting ligand-receptor interactions (for the NicheNet-v2 lr_network). Based on the Supplementary Note 1 from the MultiNicheNet preprint, I see that in the NicheNet-v2 model there has been some filtering of OmniPath's intercell database that is used in NicheNet-v2's lr_network.

Purpose of doing this: I ran a NicheNet analysis and have some ligand-receptor interactions I would like to further explore. My interactions of interest happened to all be from OmniPath. I would like to be able to see the data sources (databases, literature citations, etc.) that OmniPath has for these interactions. I know how to do this in OmniPath/OmnipathR generally speaking, but I'd like to be able to specifically see the data sources in OmniPath that are actually used in NicheNet-v2 (vs. those that may be filtered out).

My problem: I am struggling with a few of the filtering steps of the OmniPath intercell database. I think this is due to my lack of experience rather than the description so I could use some guidance.

Progress so far: I am trying to recreate these steps listed in Supplementary Note 1 in my code. I'm breaking down the steps here to try to follow them correctly:

  1. To get the intercellular annotations, we first used the Omnipath function import_omnipath_intercell while filtering out annotations only coming from the GO_Intercell or Omnipath source databases.
library(tidyverse)
library("OmnipathR")

# Make a list of the resources to keep (exclude "GO_Intercell" and "OmniPath")
db_to_keep = get_intercell_resources()[(!get_intercell_resources() %in% c("GO_Intercell", "OmniPath"))]

# Use the Omnipath function `import_omnipath_intercell`
omnipath_intercell = import_omnipath_intercell(
  resources = db_to_keep, # exclude annotations only coming from the GO_Intercell or Omnipath
  organism = 9606,
  fields = NULL,
  default_fields = TRUE,
  references_by_resource = TRUE,
  exclude = NULL,
  strict_evidences = FALSE
)
  1. We observed that only doing this step leads to missing some ligands, like NAMPT, which are secreted but not annotated as “transmitter” in Omnipath. To include this type of orphan ligands among the entire set of ligands, we used the annotation database to search for proteins that are secreted but not annotated as “transmitter”, and that have a higher Omnipath consensus score to be secreted than intracellular.

I think I undestand "proteins that are secreted but not annotated as 'transmitter'" (omnipath_intercell %>% filter(secreted == TRUE & transmitter == FALSE)), but how is the filtering for the consensus score done?

I see that there is a consensus score but what field(s) should I be looking at for secreted vs. "intracellular"?

omnipath_intercell

> # A tibble: 217,091 × 15
   category      parent        database         scope   aspect     source            uniprot genesymbol entity_type consensus_score transmitter receiver secreted plasma_membrane_transmembr…¹ plasma_membrane_peri…²
   <chr>         <chr>         <chr>            <chr>   <chr>      <chr>             <chr>   <chr>      <chr>                 <dbl> <lgl>       <lgl>    <lgl>    <lgl>                        <lgl>                 
 1 transmembrane transmembrane UniProt_location generic locational resource_specific Q8N661  TMEM86B    protein                   4 FALSE       FALSE    FALSE    FALSE                        FALSE                 
 2 transmembrane transmembrane UniProt_location generic locational resource_specific Q8IWU2  LMTK2      protein                   7 FALSE       FALSE    FALSE    FALSE                        FALSE                 
 3 transmembrane transmembrane UniProt_location generic locational resource_specific P41273  TNFSF9     protein                   7 FALSE       FALSE    TRUE     FALSE                        FALSE                 
 4 transmembrane transmembrane UniProt_location generic locational resource_specific Q9Y661  HS3ST4     protein                   4 FALSE       FALSE    FALSE    FALSE                        FALSE                 
 5 transmembrane transmembrane UniProt_location generic locational resource_specific Q9UPX0  IGSF9B     protein                   5 FALSE       FALSE    FALSE    TRUE                         FALSE                 
 6 transmembrane transmembrane UniProt_location generic locational resource_specific Q9NYV7  TAS2R16    protein                   5 FALSE       FALSE    FALSE    FALSE                        FALSE                 
 7 transmembrane transmembrane UniProt_location generic locational resource_specific P01911  HLA-DRB1   protein                   8 FALSE       FALSE    FALSE    TRUE                         FALSE                 
 8 transmembrane transmembrane UniProt_location generic locational resource_specific Q6P9B9  INTS5      protein                   4 FALSE       FALSE    FALSE    FALSE                        FALSE                 
 9 transmembrane transmembrane UniProt_location generic locational resource_specific P05496  ATP5MC1    protein                   5 FALSE       FALSE    FALSE    FALSE                        FALSE                 
10 transmembrane transmembrane UniProt_location generic locational resource_specific P55344  LIM2       protein                   4 FALSE       FALSE    FALSE    FALSE                        FALSE                 
# ℹ 217,081 more rows
# ℹ abbreviated names: ¹​plasma_membrane_transmembrane, ²​plasma_membrane_peripheral
# ℹ Use `print(n = ...)` to see more rows

Hopefully this isn't too tedious of a question. Also apologies if this code for filtering OmniPath data is available somewhere and I just couldn't find it! Thanks for your time and for developing such a great tool!

browaeysrobin commented 8 months ago

Hi @beigelk

You can find the code for this specific filtering step, and all the other code we used to create the NicheNet-v2 LR network on this Zenodo page: https://zenodo.org/records/8016880

This code is already more than a year old, so I hope OmnipathR has not changed too much since then.

beigelk commented 8 months ago

Oh, that's fantastic! Thank you so much @browaeysrobin!