morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Vsouza add prefix gamblr.data #242

Closed vladimirsouza closed 11 months ago

vladimirsouza commented 11 months ago

Pull Request Checklists

Checklist for all PRs

Required

To test the new changes, I ran the scripts resources/test_functions.R and resources/test_remote.R.

I got these error messages from resources/test_functions.R (using GSC server), however they should not be related to any change in this PR:

### These errors can be fixed by specifying parameters `these_samples_metadata` and `these_sample_ids`.
### We decided to don't change this script, because we want to tweak `id_ease` function, which was changed
###   in a previous merged PR. 

all_sv = get_manta_sv()
# Error in id_ease(these_samples_metadata = these_samples_metadata, these_sample_ids = these_sample_ids,  : 
#   argument "these_samples_metadata" is missing, with no default

some_sv = get_manta_sv(these_sample_ids = "94-15772_tumorA")
# Error in id_ease(these_samples_metadata = these_samples_metadata, these_sample_ids = these_sample_ids,  : 
#   argument "these_samples_metadata" is missing, with no default

myc_locus_sv = get_manta_sv(region = "8:128723128-128774067")
# Error in id_ease(these_samples_metadata = these_samples_metadata, these_sample_ids = these_sample_ids,  : 
#   argument "these_samples_metadata" is missing, with no default
### I made the issue https://github.com/morinlab/GAMBLR/issues/243 for this error:

cn_matrix = get_cn_states(these_samples_metadata = get_gambl_metadata() %>% dplyr::filter(pathology == "FL"), all_cytobands = TRUE, use_cytoband_name = TRUE)
# Currently, only grch37 is supported                                                                                                         
# Cytobands are in respect to hg19. This will take awhile but it does work, trust me!
# Error in `[.data.frame`(cn_matrix, , region_names, drop = FALSE) :                                                                          
#   undefined columns selected
### I made the issue https://github.com/morinlab/GAMBLR/issues/244 for this error:

MYC_cn_expression = get_gene_cn_and_expression(gene_symbol = "MYC")
# [1] "grep -w -F -e Hugo_Symbol -e MYC /projects/nhl_meta_analysis_scratch/gambl/results_local/icgc_dart/DESeq2-0.0_salmon-1.0/mrna--gambl-icgc-all/vst-matrix-Hugo_Symbol_tidy.tsv"
# Error in `left_join()`:
#   ! Can't join `x$sample_id` with `y$genome_sample_id` due to incompatible types.
# ℹ `x$sample_id` is a <character>.
# ℹ `y$genome_sample_id` is a <logical>.
# Run `rlang::last_trace()` to see where the error occurred.

From resources/test_remote.R, these are the errors that I got when in remote mode. Again they should not be related to my changes in this PR:

### This error is due to a missing file in the GAMBLR installation on my desktop. Not related to this PR.

test_ssm = get_ssm_by_samples(these_sample_ids = c("14-24534_tumorA","14-24534_tumorB"),
                              subset_from_merge = T)
# using existing merge: /home/vladimir/repos/gambl_results/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.maf
# [1] "missing:  /home/vladimir/repos/gambl_results/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.maf"
# Cannot find file locally. If working remotely, perhaps you forgot to load your config (see below) or sync your files?
# Sys.setenv(R_CONFIG_ACTIVE = "remote")
# Error in data.table::fread(file = maf_file_path, sep = "\t", stringsAsFactors = FALSE,  : 
#   File '/home/vladimir/repos/gambl_results/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.maf' does not exist or is non-readable. getwd()=='/# # home/vladimir/repos/gambl_results'
### Thanks Adam said these errors are fixed in his branch (https://github.com/morinlab/GAMBLR/pull/240).

cn_ssm = assign_cn_to_ssm("14-24534_tumorB",seg_file_source = "battenberg")
# Error in if (tissue_status_filter == "normal") { :
# the condition has length > 1

pursteenah = estimate_purity(this_sample_id = "14-24534_tumorB",seg_file_source = "battenberg")
# Error in if (tissue_status_filter == "normal") { :
# the condition has length > 1
### This is a warning, not an error. It doesn't look important to me.

finalize_study(seq_type_filter="genome",short_name="GAMBL_genome_2022",these_sample_ids=all_samples,
               human_friendly_name = "GAMBL genomes 2022 edition",
               project_name="gambl_genome_2022",
               description = "GAMBL genome data",out_dir = cbio_path,overwrite = TRUE)
# /home/vladimir/repos/gambl_results/shared/gambl_genome_results.tsv                                                                                                 
# Joining with `by = join_by(patient_id, sample_id, biopsy_id)`                                                                                                      
# Warning messages:                                                                                                                                                  
# 1: In write.table(meta_samples, file = clinsamp, sep = "\t", row.names = FALSE,  :
#   appending column names to file
# 2: In write.table(all_outcomes, file = clinpat, sep = "\t", row.names = FALSE,  :
#   appending column names to file

I also ran resources/test_remote.R on GSC server, these are the errors/warnings that I got. Again, not related to this PR:

### works with warnings

cn_ssm = assign_cn_to_ssm("14-24534_tumorB",seg_file_source = "battenberg")
# trying to find output from: battenberg                                                                                                          
# looking for flatfile: /projects/nhl_meta_analysis_scratch/gambl/results_local/gambl/battenberg_current/99-outputs/seg/genome--projection/14-24534_tumorB--14-24534_normal--matched.battenberg.grch37.seg
# Warning message:                                                                                                                                
# In if (tissue_status_filter == "normal") { :
#   the condition has length > 1 and only the first element will be used

pursteenah = estimate_purity(this_sample_id = "14-24534_tumorB",seg_file_source = "battenberg")
# trying to find output from: battenberg                                                                                                          
# looking for flatfile: /projects/nhl_meta_analysis_scratch/gambl/results_local/gambl/battenberg_current/99-outputs/seg/genome--projection/14-24534_tumorB--14-24534_normal--matched.battenberg.grch37.seg
# Warning message:                                                                                                                                
# In if (tissue_status_filter == "normal") { :
#   the condition has length > 1 and only the first element will be used
### `finalize_study` function wasn't run.