saezlab / OmnipathR

R client for the OmniPath web service
https://r.omnipathdb.org/
MIT License
94 stars 19 forks source link

kegg_pathways_download returns fewer pathways #102

Open cchen22 opened 1 week ago

cchen22 commented 1 week ago

Hello,

Thank you so much for creating this fantastic package! I used " kegg_pathways_download" to download all kegg pathways. I am confused it says "downloaded 551 records" but in the output dataframe there are only 193 unique pathways? I wonder what filtering is used inside this function. Thank you very much!

> kegg_pw = kegg_pathways_download(max_expansion = NULL, simplify = FALSE)
[2024-06-27 11:55:28] [SUCCESS] [OmnipathR] KEGG (www.genome.jp): downloaded 551 records
[2024-06-27 11:55:49] [SUCCESS] [OmnipathR] UniProt (rest.uniprot.org): downloaded 20435 records
deeenes commented 1 week ago

Hello, The code currently extracts only protein-protein interactions from pathways, which means mostly signaling interactions. Most metabolic pathways will yield zero interactions this way. We don't even try to process those pathways. As you see here, the step when we translate the identifiers from gene symbols to UniProt IDs is very simplistic. Today already the metabolite IDs could be processed too using the HMDB or Chalmers Sysbio ID translation. And an option could be provided to process the pathways into data frames without ID translation.