morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Houman lm maf to custom track #221

Closed HoumanLM closed 1 year ago

HoumanLM commented 1 year ago

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for all PRs

Required

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

Optional but preferred with PRs

Checklist for New Functions

Required

Example:

#' @title ASHM Rainbow Plot
#'
#' @description Make a rainbow plot of all mutations in a region, ordered and coloured by metadata.
#'
#' @details This function creates a rainbow plot for all mutations in a region. Region can either be specified with the `region` parameter,
#' or the user can provide a maf that has already been subset to the region(s) of interest with `mutation_maf`.
#' As a third alternative, the regions can also be specified as a bed file with `bed`.
#' Lastly, this function has a variety of parameters that can be used to further customize the returned plot in many different ways.
#' Refer to the parameter descriptions, examples as well as the vignettes for more demonstrations how this function can be called.
#'
#' @param mutations_maf A data frame containing mutations (MAF format) within a region of interest (i.e. use the get_ssm_by_region).
#' @param metadata should be a data frame with sample_id as a column.
#' @param exclude_classifications Optional argument for excluding specific classifications from a metadeta file.
#' @param drop_unmutated Boolean argument for removing unmutated sample ids in mutated cases.
#' @param classification_column The name of the metadata column to use for ordering and colouring samples.
#' @param bed Optional data frame specifying the regions to annotate (required columns: start, end, name).
#' @param region Genomic region for plotting in bed format.
#' @param custom_colours Provide named vector (or named list of vectors) containing custom annotation colours if you do not want to use standartized pallette.
#' @param hide_ids Boolean argument, if TRUE, ids will be removed.
#'
#' @return ggplot2 object.
#'
#' @import dplyr ggplot2
#' @export
#'
#' @examples
#' #basic usage
#' region = "chr6:90975034-91066134"
#' metadata = get_gambl_metadata()
#' plot = ashm_rainbow_plot(metadata = metadata, region = region)
#'
#' #advanced usages
#' mybed = data.frame(start = c(128806578,
#'                              128805652,
#'                              128748315),
#'                    end = c(128806992,
#'                            128809822,
#'                            128748880),
#'                    name = c("TSS",
#'                             "enhancer",
#'                             "MYC-e1"))
#'
#' ashm_rainbow_plot(mutations_maf = my_mutations,
#'                   metadata = my_metadata,
#'                   bed = mybed)
#'

Example:

#' @return nothing
#' @export
#' @import tidyverse ggrepel

Checklist for changes to existing code

mattssca commented 1 year ago

Thanks for working on this @HoumanLM.

I just casually wanted to point out that this PR adds some real heavyweights in terms of package dependencies (GenomicRanges, IRanges, and Rsamtools). In the past we wanted to avoid some (or all?) of these.

It might be the case that we indeed need all of these packages for the function added in this PR to work. But nonetheless, I wanted to bring it up for discussion.

If the newly added dependencies will make their way to master, we should probably think about ways we can utilize them further, throughout GAMBLR (e.g this slack post)

mattssca commented 1 year ago

I just had a look at this PR to see why it was failing the build check under Git actions. Not sure if you had a look at this output yet, but this is why the build check failed:

Undocumented arguments in documentation object 'findMotif'
  ‘maf’ ‘motif’ ‘projection’ ‘fastaPath’
Documented arguments not in \usage in documentation object 'findMotif':
  ‘maf:’ ‘motif:’ ‘projection:’ ‘fastaPath:’

After looking at the documentation for the new function added in this PR, you want to remove the colons after the parameter names, since the string following @param needs to be an exact match of the used parameter name inside the function. Does this make sense?

#' @param maf: MAF data frame (required columns: Reference_Allele, Chromosome, Start_Position, End_Position)
#' @param motif: The motif sequence (default is WRCY)
#' @param projection: The genome build projection for the variants you are working with (default is grch37)
#' @param fastaPath: Can be a path to a FASTA file

Thanks!

mattssca commented 1 year ago

Nevermind my last comment, you resolved it in your commit that was published one minute earlier than my review comment. Thanks!

HoumanLM commented 1 year ago

Thanks Adam for your explanation.