morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Fix for `get_coding_ssm_status` function #182

Closed Kdreval closed 1 year ago

Kdreval commented 1 year ago

This PR is a fix for the issue #181 . Minimal reproducible example for the new functionality:


library(tidyverse)
library(vroom)

setwd("~/GAMBLR/")

devtools::load_all()

# get meta
meta <- get_gambl_metadata(
    case_set = "DLBCL-unembargoed"
)

# get mutations
maf <- get_coding_ssm(
        these_samples_metadata = meta
    ) %>%
    dplyr::filter(
        Hugo_Symbol == "CREBBP" # subset to example gene
    )

new_way <- get_coding_ssm_status(
        gene_symbols = "CREBBP",
        maf_data = maf,
        these_samples_metadata = meta
    )

new_way_multihit <- get_coding_ssm_status(
        gene_symbols = "CREBBP",
        maf_data = maf,
        keep_multihit_hotspot = TRUE,
        these_samples_metadata = meta
    )

setdiff(new_way_multihit, new_way)

This will show difference in 7 samples where the CREBBP is actually multihit and therefore both hot spot and non hot spot should be annotated.

Kdreval commented 1 year ago

This has been updated to not only report non-hotspot mutations when the user asks to keep multihit samples, but also to keep track of how many non-hotspot mutations are in there. This is also documented in the argument description.