morinlab / GAMBLR.data

Collection of Curated Data for Genomic Analysis of Mature B-cell Lymphomas in R
https://morinlab.github.io/GAMBLR.data/
MIT License
2 stars 0 forks source link

Downgrade check_excess_params to warning message #49

Closed vladimirsouza closed 10 months ago

vladimirsouza commented 10 months ago

This PR is to solve the issue in GAMBLR.viz heatmap_mutation_frequency_bin that internally calls calc_mutation_frequency_bin_regions. This was discussed in this Slack thread.

The change is to downgrade the check_excess_params error message to a warning message.

vladimirsouza commented 10 months ago

This is what heatmap_mutation_frequency_bin returns, besides the pot of course.

> heatmap_mutation_frequency_bin(
+   regions_bed = my_regions,
+   these_samples_metadata = my_meta,
+   metadataColumns = meta_columns,
+   sortByColumns = meta_columns
+ )
Warning: Multiple regions in the provided data frame have the same name. Merging these entries based on min(start) and max(end) per name value. 
Warning: You have given one or more unsupported or deprecated arguments to calc_mutation_frequency_bin_regions and they are going to be ignored. Please check the documentation and spelling of your arguments.
Ignored argument(s): from_indexed_flatfile, mode.
Warning: Multiple regions in the provided data frame have the same name. Merging these entries based on min(start) and max(end) per name value. 
starting with 5 samples
returning matrix with 5 samples
No legend element is put in the last 1 row under `nrow = 3`, maybe you should set
`by_row = FALSE`? Reset `nrow` to 2.
No legend element is put in the last 1 row under `nrow = 3`, maybe you should set
`by_row = FALSE`? Reset `nrow` to 2.
No legend element is put in the last 1 row under `nrow = 3`, maybe you should set
`by_row = FALSE`? Reset `nrow` to 2.

There are other warning messages that we don't want there but this is out of the scope of this PR.

The plot is created okay.

mattssca commented 10 months ago

What if you try something like:

GAMBLR.data::get_coding_ssm(adam_is_awesome = TRUE)

(since the GAMBLR.data version of this function also calls the helper to determine if any invalid arguments are passed).

vladimirsouza commented 10 months ago

It returns:

> GAMBLR.data::get_coding_ssm(adam_is_awesome = TRUE)
Warning: Yeah! Adam is the guy!!

Sorry, I couldn't avoid the joke. I will delete this message in 5 min.

vladimirsouza commented 10 months ago
> get_coding_ssm(adam_is_awesome = TRUE)
Using the bundled SSM calls (.maf) calls in GAMBLR.data...
Warning: You have given one or more unsupported or deprecated arguments to get_coding_ssm and they are going to be ignored. Please check the documentation and spelling of your arguments.
Ignored argument(s): adam_is_awesome.
Using the bundled metadata in GAMBLR.data...
after linking with metadata, we have mutations from 446 samples
# A tibble: 46,559 × 47
   Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
   <chr>                <dbl> <chr>  <chr>      <chr>               <dbl>
 1 AIFM1                    0 .      GRCh37     X               129264130
 2 USP24                    0 .      GRCh37     1                55638175
 3 LYST                     0 .      GRCh37     1               235966207
 4 GCKR                     0 .      GRCh37     2                27745442
 5 ORMDL1                   0 .      GRCh37     2               190640297
 6 BCL6                     0 .      GRCh37     3               187442788
 7 RP11-13P5.1              0 .      GRCh37     6               159530473
 8 EZH2                     0 .      GRCh37     7               148508728
 9 DIP2C                    0 .      GRCh37     10                 454908
10 SORCS1                   0 .      GRCh37     10              108439075
# ℹ 46,549 more rows
# ℹ 41 more variables: End_Position <dbl>, Strand <chr>,
#   Variant_Classification <chr>, Variant_Type <chr>, Reference_Allele <chr>,
#   Tumor_Seq_Allele1 <chr>, Tumor_Seq_Allele2 <chr>, dbSNP_RS <chr>,
#   dbSNP_Val_Status <lgl>, Tumor_Sample_Barcode <chr>,
#   Matched_Norm_Sample_Barcode <chr>, Match_Norm_Seq_Allele1 <chr>,
#   Match_Norm_Seq_Allele2 <chr>, Tumor_Validation_Allele1 <lgl>, …
# ℹ Use `print(n = ...)` to see more rows
vladimirsouza commented 10 months ago

In the last commit Add rtracklayer dependency, I added rtracklayer as a dependency because of this line.

When I run devtools::check() in GAMBLR.viz, is says this

‘GAMBLR.viz’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  there is no package called ‘rtracklayer’
Calls: <Anonymous> ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
ERROR: lazy loading failed for package ‘GAMBLR.viz’
* removing ‘/tmp/RtmpsA3CoD/file4e553ff71dee/GAMBLR.viz.Rcheck/GAMBLR.viz’

Which is a problem in GAMBLR.data that is solved in this commit.

The importfrom tag can't be added in data-raw/DATASET.R, so i used usethis::use_import_from("rtracklayer","import"), which creates the R/GAMBLR.data-package.R file.

Kdreval commented 10 months ago

I added some changes here to drop the dependency on rtracklayer since that chage was unnecessary. The GAMBLR.viz a completely separate package and there is no need to introduce heavy dependency in the GAMBLR.data.

Both devtools check with default arguments and arguments standard to GAMBLR fail because of tests. Here is the relevant part:

W  checking for unstated dependencies in ‘tests’ (6.7s)
   'library' or 'require' call not declared from: ‘testthat’
─  checking tests ...
─  Running ‘test_get.R’
E  Some test files failed
   Running the tests in ‘tests/test_get.R’ failed.

This has nothing to do with rtracklayer and so it's introduction as dependency was reverted in the last 2 commits.

@mattssca can you please review this and if it looks good approve/merge?

mattssca commented 10 months ago

Thanks for looking into this. I will test and review first thing tomorrow.

vladimirsouza commented 10 months ago

In my tests, devtools::check also never complained about rtracklayer when it's run on GAMBLR.data. The reason why I added rtracklayer in GAMBLR.data, is because when I run devtools::check on GAMBLR.viz and rtracklayer is not added in GAMBLR.data, it returns this error:

* installing *source* package ‘GAMBLR.viz’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  there is no package called ‘rtracklayer’
Calls: <Anonymous> ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
ERROR: lazy loading failed for package ‘GAMBLR.viz’
* removing ‘/tmp/RtmpWQFCwg/file230c995342e3/GAMBLR.viz.Rcheck/GAMBLR.viz’

Keep in mind that GAMBLR.data is a dependency for GAMBLR.viz.

To reproduce this error, I updated my local GAMGLR.data branch — pulling Kostia's changes — and devtools::install() to install this GAMGLR.data branch; then I cd to my local GAMBLR.viz branch (vsouza-fix_function_code_and_examples) and ran devtools::check()

vladimirsouza commented 10 months ago

But if I install GAMBLR.data while keeping the changes in my commit Add rtracklayer dependency, and run devtools::check on my GAMBLR.viz branch, it returns

0 errors ✔ | 0 warnings ✔ | 4 notes ✖
vladimirsouza commented 10 months ago

This might be a bug in devtools::check. Now I installed my GAMBLR.viz branch, while this GAMBLR.data branch is installed (keeping Kostia's changes), and GAMBLR.viz was installed normally.

vladimirsouza commented 10 months ago

I did a small test using the installation of both branches (viz and data; data branch with Kostia's changes) and it worked fine. I ran heatmap_mutation_frequency_bin examples and got the plot.

# get meta data
my_meta <- get_gambl_metadata() %>% 
  filter(sample_id %in% c("DOHH-2", "OCI-Ly10", "OCI-Ly3", "SU-DHL-10", "SU-DHL-4"))

# get ashm regions of a set of genes.
my_regions = GAMBLR.data::somatic_hypermutation_locations_GRCh37_v_latest %>%
  rename( "chrom"="chr_name", "start"="hg19_start", "end"="hg19_end", "name"="gene") %>%
  mutate( chrom = stringr::str_remove(chrom, "chr") )

# create heatmap of mutation counts for the specified regions
meta_columns <- c("pathology", "lymphgen", "COO_consensus", "DHITsig_consensus")
heatmap_mutation_frequency_bin(
  regions_bed = my_regions,
  these_samples_metadata = my_meta,
  metadataColumns = meta_columns,
  sortByColumns = meta_columns
)