morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Bug fixes, new plotting functions, vignette examples, etc. #88

Closed mattssca closed 2 years ago

mattssca commented 2 years ago

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for all PRs

Required

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

Optional but preferred with PRs

Checklist for New Functions

Required

Example:

#' Use GISTIC2.0 scores output to reproduce maftools::chromoplot with more flexibility
#'
#' @param scores output file scores.gistic from the run of GISTIC2.0
#' @param genes_to_label optional. Provide a data frame of genes to label (if mutated). The first 3 columns must contain chromosome, start, and end coordinates. Another required column must contain gene names and be named `gene`. (truncated for example)
#' @param cutoff optional. Used to determine which regions to color as aberrant. Must be float in the range [0-1]. (truncated for example)

Example:

#' @return nothing
#' @export
#' @import tidyverse ggrepel

Checklist for changes to existing code

mattssca commented 2 years ago

Updates in this PR address resolving existing bugs, quality-of-life updates and the addition of new plotting functions for SVs. The updates include:

  1. get_coding_ssm_status has been updated to call get_coding_ssm to reduce duplicated code.

  2. get_coding_ssm have been updated from seq_type to seq_type_filter (previously returned an error: Error in get_gambl_metadata(from_flatfile = from_flatfile, seq_type = seq_type) : argument 2 matches multiple formal arguments).

  3. In fixing the above-mentioned issue, pretty_lollipop_plot in the vignette is now working the intended way again (issue closed).

  4. prettyOncoplot in the vignette has been fixed, the issue has been closed.

  5. get_coding_ssm with flat_file set to TRUE incorrectly caused the function to try to read from a maf path (non-existing since db is used to retrieve data).

  6. get_ssm_by_region has been updated to make use of recent indexed (tabix) MAFs with respect to selected genome projection. Thus, the chunk that deals with regions has been updated to deal with chr prefixes accordingly.

  7. Flipping colours of SV counts in fancy_ideogram (previously incorrectly labelled deletions as indels and the other way around).

  8. New parameters added to a collection of fancy_x_plots to allow the user to plot an already loaded (MAF-like) df in R, or directly specify a path to such a file. This allows for more flexibility to the affected plotting functions where the user can run these functions in an interactive session without having to run assign_cn_states or get_sample_cn_segments to retrieve plotting data.

  9. New plotting function added - fancy_circos_plot. This function imports RCircos and constructs a highly customizable circos plot. Per default, this plot visualizes translocations (ribbons). Optional arguments will create additional tracks to plot SVs. The user can also annotate the circos plot with gene lists. Optional parameters for VAF filtering, chromosome subsetting and genomic projection is also available. This function calls get_combined_sv to retain SV information (deletions, duplications and translocations).

  10. New plotting function added - fancy_sv_size_plot, Generate plot visualizing SV sizes retreived with get_combined_sv. Subset on variant type, filter on VAF, size etc.

  11. GAMBLR documentation has been updated to reflect the current state and changes included in this PR. All new functions, changes to existing parameters, examples, NAMESPACE etc. have been documented adequately.