saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
190 stars 24 forks source link

Error after get_resource() and check_repeated_edges(network) #107

Closed sciotlos closed 8 months ago

sciotlos commented 10 months ago

Hello,

I'm encountering the following error when trying to either run rename_net() or run_ulm() with a network I loaded using get_resource(), specifically the NetPath data:

NP_net <- get_resource("NetPath") NP_net NP_net['mor'] <- 1.0

 A tibble: 8,870 × 5
   uniprot genesymbol entity_type pathway                                      mor
   <chr>   <chr>      <chr>       <chr>                                      <dbl>
 1 Q15109  AGER       protein     Advanced glycation end-products (AGE/RAGE)     1
 2 O95831  AIFM1      protein     Advanced glycation end-products (AGE/RAGE)     1
 3 P54819  AK2        protein     Advanced glycation end-products (AGE/RAGE)     1
 4 P31749  AKT1       protein     Interleukin-2 (IL-2)                           1
 5 P31749  AKT1       protein     Thymic stromal lymphopoietin (TSLP)            1
 6 P31749  AKT1       protein     Androgen receptor (AR)                         1
 7 P31749  AKT1       protein     Corticotropin-releasing hormone (CRH)          1
 8 P31749  AKT1       protein     Hedgehog                                       1
 9 P31749  AKT1       protein     Notch                                          1
10 P31749  AKT1       protein     Interleukin-11 (IL-11)                         1
8,860 more rows

Next, when I try either of these:

NP_net <- rename_net(network = NP_net, .source = "pathway", .target = "genesymbol", .mor = "mor")
x <- run_ulm(mat=mat_list[[m]], net=NP_net, .source="pathway", .target = "genesymbol")

I get this error:

Error in check_repeated_edges(network): Network contains repeated edges, please remove them.

How would I correctly format the NetPath tibble? Or am I not using the proper function for testing? Thanks for your help!

PauBadiaM commented 9 months ago

Hi @sciotlos,

Some resources might use different gene symbol synonyms that get harmonized to the same one through OmniPath. You can remove these duplicates in your net dataframe using dplyr::distinct, and then pass it to run_ulm. Also, there is no need to use rename_net if you already provide the correct column names (.source='pathway', .target='genesymbol') into the run_ulm function. Hope this is helpful!

ma2o commented 9 months ago

Hi there,

I ran into the same issue when using the MSigDB collection.

net_mysig <- decoupleR::get_resource("MSigDB")

The issue here is, that you have an 1->n mapping from uniprot to genesymbols. Just remove the uniprot column or select only the columns for further analysis and run a unique on these:

net_mysig['mor'] <- 1.0 net_mysig_un <- unique(net_mysig[,c(2,5,6)])

That worked for me for the MSigDB collection and run_fgsea().