Closed Kdreval closed 2 years ago
Can this issue be closed?
Thanks Adam, I'll test this today and see if this works now!
I don't think this is resolved yet. Does the version in the current PR have different output?
> capture_meta <- get_gambl_metadata(seq_type_filter = c('capture')) %>%
+ dplyr::filter(consensus_pathology =='DLBCL') %>%
+ dplyr::filter(COO_consensus == 'ABC')
> capture_abc_maf <- get_coding_ssm(limit_samples = capture_meta$sample_id,
+ basic_columns = TRUE,
+ exclude_cohort = c('dlbcl_chapuy'),
+ seq_type = "capture")
reading from: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/capture--projection/deblacklisted/augmented_maf/all_slms-3--grch37.CDS.maf
|--------------------------------------------------|
|==================================================|
mutations from 1023 samples
after linking with metadata, we have mutations from 1 samples
we have to make sure anywhere get_codingssm (or any get* function is called) always requires and passes seq_type along.
This is the output I am getting when running the above code on my branch (pending PR):
reading from: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/capture--projection/deblacklisted/augmented_maf/all_slms-3--grch37.CDS.maf
|--------------------------------------------------|
|==================================================|
mutations from 1023 samples
after linking with metadata, we have mutations from 242 samples
I believe this was fixed in removing the duplicated lines related to all_meta and the addition of the line all_meta = dplyr::filter(all_meta, seq_type == {{ seq_type }})
This will fix this issue then!
Is 242 the right/expected number of samples for capture? Seems low.
This is only restricted to ABC exomes and excluding Chapuy. The subsetting is done to test how different arguments work and are arbitrary/randomly selected, and the number seems reasonable. There are a total 244 exomes matching that criteria, I think, so mutations from 242 are close to expected. I would need to figure out what are the missing 2 though after the current PR is on master
Got it! I'll just say something I think I have said before but want to make sure we're on the same page about. I would prefer our functions to start using the these_samples_metadata (i.e. automatically figuring out what samples the user wants) instead of a sample_id vector. In the above example this would look like the following. I hope our code currently works the same both ways.
capture_meta <- get_gambl_metadata(seq_type_filter = c('capture')) %>%
dplyr::filter(consensus_pathology =='DLBCL') %>%
dplyr::filter(COO_consensus == 'ABC')
capture_abc_maf <- get_coding_ssm(these_samples_metadata = capture_meta,
basic_columns = TRUE,
exclude_cohort = c('dlbcl_chapuy'),
seq_type = "capture")
The
get_coding_ssm
for capture data returns maf file for 1-2 samples. This is a reproducible example:which returns data for only 1 sample
I think this is because the call to
all_meta
here always uses the default, which isgenome
. I think modifying this toshould resolve the issue