saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
183 stars 24 forks source link

added get_KSN_omnipath function #73

Closed adugourd closed 1 year ago

adugourd commented 1 year ago

Yo check mah stuff pliz

dbdimitrov commented 1 year ago

nice stuff

dbdimitrov commented 1 year ago

i aprv

PauBadiaM commented 1 year ago

nono, it brok

adugourd commented 1 year ago

image

adugourd commented 1 year ago

Good point Denes, I had to recheck it (wrote the core function a very long time ago '^^

Basically all the duplicates are Kinase/substrate interactions that were annotated at the same time as phosphorilation AND dephosphorilation somehow.

these are the duplicates here: KSN_omnipath_dupplicates.csv

So I assumed that since they were annotated as BOTH phospho and dephosphorilation, it was actually probably just wrongly annotated as phosphorialtion. Especially since those are mainly phosphatases actually.

So I made them as -1 here because it actually worked well, but it's not very safe indeed. If in the futur duplicates appear for other reasons that could lead to problems.

deeenes commented 1 year ago

After a very short look, these all seem to be phosphatases, so MOR -1 is correct. Slightly simpler implementation, plus I checked that mor = min(mor) gives the same output, and more justifiable:

#' Kinase-substrate network from OmniPath
#'
#' @param ... Passed to ``OmnipathR::import_omnipath_enzsub``.
#'
#' @importFrom magrittr %>% %T>%
#' @importFrom rlang !!!
#' @importFrom OmnipathR import_omnipath_enzsub omnipath_msg
#' @importFrom dplyr filter mutate select group_by ungroup distinct
#' @importFrom dplyr summarize_all first
#' @export
ksn_omnipath <- function(...) {

    # NSE vs. R CMD check workaround
    modification <- substrate_genesymbol <- residue_type <- residue_offset <-
    enzyme_genesymbol <- target <- mor <- comb <- NULL

    list(...) %>%
    OmnipathR::import_omnipath_enzsub(!!!.) %>%
    filter(modification %in% c('phosphorylation', 'dephosphorylation')) %>%
    mutate(
        target = sprintf(
            '%s_%s%i',
            substrate_genesymbol,
            residue_type,
            residue_offset
        ),
        mor = (modification == 'phosphorylation') * 2L - 1L
    ) %>%
    select(source = enzyme_genesymbol, target, mor) %>%
    distinct %>%
    group_by(source, target) %>%
    mutate(mor = min(mor)) %>%
    summarize_all(first) %>%
    ungroup %T>%
    {OmnipathR::omnipath_msg(
        'success',
        '%i enzyme-PTM interactions after preprocessing.',
        nrow(.)
    )}

}
adugourd commented 1 year ago

LGTM

PauBadiaM commented 1 year ago

Looks good! @adugourd could you change your function to this and push it?

deeenes commented 1 year ago

Btw the checks fail because of OmnipathR, in master I fixed it already