morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Hot-fixes #240

Closed mattssca closed 7 months ago

mattssca commented 11 months ago

Details

This PR introduces minor hotfixes related to multiple open issues on this repo. Here is a detailed description of what was fixed and how.

get_sample_cn_segments has non-standard parameter naming (issue #224).

This function has been updated to work similarly to other GAMBLR functions with respect to what parameters to use for subset the return to specific sample IDs. In addition, the newly added helper function (id_ease) has also been implemented to streamline this process. Function docs and examples have all been updated to reflect the changes introduced in this PR. In-code reference to this function has also been updated throughout GAMBLR to make use of the new parameters.

get_gambl_metadata broken for normals (issue #190).

This function has been updated to check the tissue_status_filter parameter. More specifically, if Normals are requested, and only normals this will be returned. The function works as expected when multiple elements are given to the parameter as well (i.e. tissue_status_filter = c("tumour", "normal")). This fix also solves another issue (issue #225) reported on this repo (bug in assign_cn_to_ssm).

Axis labels in plots generated by SSM vignette have extensively large fonts (issue #227).

The vignettes have been reworked to properly display the rendered figures. The issue was that some images got too compressed when knitted with default parameters (width = 7 and height = 5). In addition, the chunk options in the vignette have been reduced and global knitting options are defined at the beginning of the vignette. Minor tweaks and edits to the vignettes are also included in this update.

In addition, a hotfix for annotate_driver_ssm is also introduced in this PR. This function was dependent on a now relocated bundled dataset if no genes were given to the driver_genes parameter. The function has been updated to now call the same dataset but from the correct location.

Lastly, the smk file used for syncing GAMBL data has also been updated to allow get_manta_sv to be run in a remote setting.

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for all PRs

Required

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

Optional but preferred with PRs

Checklist for New Functions

Required

Example:

#' @title ASHM Rainbow Plot
#'
#' @description Make a rainbow plot of all mutations in a region, ordered and coloured by metadata.
#'
#' @details This function creates a rainbow plot for all mutations in a region. Region can either be specified with the `region` parameter,
#' or the user can provide a maf that has already been subset to the region(s) of interest with `mutation_maf`.
#' As a third alternative, the regions can also be specified as a bed file with `bed`.
#' Lastly, this function has a variety of parameters that can be used to further customize the returned plot in many different ways.
#' Refer to the parameter descriptions, examples as well as the vignettes for more demonstrations how this function can be called.
#'
#' @param mutations_maf A data frame containing mutations (MAF format) within a region of interest (i.e. use the get_ssm_by_region).
#' @param metadata should be a data frame with sample_id as a column.
#' @param exclude_classifications Optional argument for excluding specific classifications from a metadeta file.
#' @param drop_unmutated Boolean argument for removing unmutated sample ids in mutated cases.
#' @param classification_column The name of the metadata column to use for ordering and colouring samples.
#' @param bed Optional data frame specifying the regions to annotate (required columns: start, end, name).
#' @param region Genomic region for plotting in bed format.
#' @param custom_colours Provide named vector (or named list of vectors) containing custom annotation colours if you do not want to use standartized pallette.
#' @param hide_ids Boolean argument, if TRUE, ids will be removed.
#'
#' @return ggplot2 object.
#'
#' @import dplyr ggplot2
#' @export
#'
#' @examples
#' #basic usage
#' region = "chr6:90975034-91066134"
#' metadata = get_gambl_metadata()
#' plot = ashm_rainbow_plot(metadata = metadata, region = region)
#'
#' #advanced usages
#' mybed = data.frame(start = c(128806578,
#'                              128805652,
#'                              128748315),
#'                    end = c(128806992,
#'                            128809822,
#'                            128748880),
#'                    name = c("TSS",
#'                             "enhancer",
#'                             "MYC-e1"))
#'
#' ashm_rainbow_plot(mutations_maf = my_mutations,
#'                   metadata = my_metadata,
#'                   bed = mybed)
#'

Example:

#' @return nothing
#' @export
#' @import tidyverse ggrepel

Checklist for changes to existing code